Graphics processing

ABSTRACT

An instruction, or set of instructions, that can be included in a program to perform a ray tracing acceleration data structure traversal, with individual execution threads in a group of execution threads executing the program performing a traversal operation for a respective ray in a corresponding group of rays such that the group of rays performing the traversal operation together. The instruction(s), when executed by the execution threads in respect of a node of the ray tracing acceleration data structure, cause one or more rays from the group of plural rays that are performing the traversal operation together to be tested for intersection with the one or more volumes associated with the node being tested. A result of the ray-volume intersection testing can then be returned for the traversal operation.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority pursuant to 35 U.S.C. 119(a) to UnitedKingdom Patent Application No. 2108050.2, filed Jun. 4^(th), 2021, whichapplication is incorporated herein by reference in its entirety.

BACKGROUND

The technology described herein relates to graphics processing systems,and, in particular to the rendering of frames (images) for display.

FIG. 1 shows an exemplary system on-chip (SoC) graphics processingsystem 8 that comprises a host processor in the form of a centralprocessing unit (CPU) 1, a graphics processor (GPU) 2, a displayprocessor 3 and a memory controller 5.

As shown in FIG. 1 , these units communicate via an interconnect 4 andhave access to off-chip memory 6. In this system, the graphics processor2 will render frames (images) to be displayed, and the display processor3 will then provide the frames to a display panel 7 for display.

In use of this system, an application 13 such as a game, executing onthe host processor (CPU) 1 will, for example, require the display offrames on the display panel 7. To do this, the application will submitappropriate commands and data to a driver 11 for the graphics processor2 that is executing on the CPU 1. The driver 11 will then generateappropriate commands and data to cause the graphics processor 2 torender appropriate frames for display and to store those frames inappropriate frame buffers, e.g. in the main memory 6. The displayprocessor 3 will then read those frames into a buffer for the displayfrom where they are then read out and displayed on the display panel 7of the display.

One rendering process that may be performed by a graphics processor isso-called “ray tracing”. Ray tracing is a rendering process whichinvolves tracing the paths of rays of light from a viewpoint (sometimesreferred to as a “camera”) back through sampling positions in an imageplane into a scene, and simulating the effect of the interaction betweenthe rays and objects in the scene. The output data value, e.g., samplingpoint in the image, is determined based on the object(s) in the sceneintersected by the ray passing through the sampling position, and theproperties of the surfaces of those objects. The ray tracing calculationis complex, and involves determining, for each sampling position, a setof objects within the scene which a ray passing through the samplingposition intersects.

Ray tracing is considered to provide better, e.g. more realistic,physically accurate images than more traditional rasterisation renderingtechniques, particularly in terms of the ability to capture reflection,refraction, shadows and lighting effects. However, ray tracing can besignificantly more processing-intensive than traditional rasterisation.

The Applicants believe that there remains scope for improved techniquesfor performing ray tracing using a graphics processor.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the technology described herein will now be described byway of example only and with reference to the accompanying drawings, inwhich:

FIG. 1 shows an exemplary graphics processing system;

FIG. 2 is a schematic diagram illustrating a “full” ray tracing process;

FIG. 3 shows an exemplary ray tracing acceleration data structure;

FIG. 4 is a flow chart illustrating an embodiment of a full ray tracingprocess;

FIG. 5 is a schematic diagram illustrating a “hybrid” ray tracingprocess;

FIG. 6 shows schematically an embodiment of a graphics processor thatcan be operated in the manner of the technology described herein;

FIG. 7 shows schematically in more detail elements of a graphicsprocessor that can be operated in the manner of the technology describedherein;

FIG. 8 shows schematically a stack layout that may be used for managinga ray tracing traversal operation;

FIG. 9 is a flowchart showing the operation of a graphics processor inan embodiment of the technology described herein;

FIG. 10 is a flowchart showing a ray-volume intersection testingoperation according to an embodiment of the technology described herein;

FIG. 11 is a flowchart showing a ray-primitive intersection testingoperation according to an embodiment of the technology described herein;

FIG. 12 shows an embodiment of a shader program compilation process;

FIG. 13 illustrates how child node volume data for an internal(non-leaf) node of a ray tracing acceleration data structure may bestored in memory;

FIG. 14 is a flowchart showing how child node volume data for aninternal (non-leaf) node of a ray tracing acceleration data structuremay be obtained from memory according to an embodiment of the technologydescribed herein; and

FIG. 15 illustrates how primitive data for a leaf node of a ray tracingacceleration data structure may be stored in memory.

Like reference numerals are used for like elements in the Figures whereappropriate.

DESCRIPTION

A first embodiment of the technology described herein comprises a methodof operating a graphics processing system including a graphics processorwhen rendering a frame that represents a view of a scene comprising oneor more objects using a ray tracing process, wherein the ray tracingprocess uses a ray tracing acceleration data structure indicative of thedistribution of geometry for the scene to be rendered to determinegeometry for the scene that may be intersected by a ray being used for aray tracing operation, the ray tracing acceleration data structurecomprising a plurality of nodes, each node associated with a respectiveone or more volumes within the scene, the ray tracing process comprisingperforming for a plurality of rays a traversal of the ray tracingacceleration data structure to determine, by testing the rays forintersection with the volumes represented by the nodes of theacceleration data structure, geometry for the scene to be rendered thatmay be intersected by the rays;

the graphics processor comprising a programmable execution unit operableto execute programs to perform graphics processing operations, and inwhich a program can be executed by groups of plural execution threadstogether;

the method comprising:

including in a program to perform a ray tracing acceleration datastructure traversal, wherein the program is to be executed by a group ofplural execution threads, with individual execution threads in the groupof execution threads performing a traversal operation for a respectiveray in a corresponding group of rays such that the group of raysperforms the ray tracing operation together, a set of one or more‘ray-volume testing’ instructions for testing rays for intersection withthe one or more volumes associated with a given node of the ray tracingacceleration data structure that is to be tested during the traversaloperation, which set of ‘ray-volume testing’ instructions, when executedby execution threads of the group of plural execution threads, willcause:

the graphics processor to test one or more rays from the group of pluralrays that are performing the traversal operation together forintersection with the one or more volumes associated with the node beingtested; and

a result of the intersection testing to be returned for the node for thetraversal operation;

the method further comprising, when a group of execution threads isexecuting the program for a corresponding group of rays that areperforming a traversal of the ray tracing acceleration data structuretogether, in response to the execution threads executing the set of oneor more ‘ray-volume testing’ instructions in respect of a node of theray tracing acceleration data structure:

testing one or more rays from the group of plural rays that areperforming the traversal operation together for intersection with theone or more volumes associated with the node being tested; and

returning a result of the intersection testing for the node for thetraversal operation.

A second embodiment of the technology described herein comprises agraphics processing system comprising a graphics processor that isoperable to render a frame that represents a view of a scene comprisingone or more objects using a ray tracing process, wherein the ray tracingprocess uses a ray tracing acceleration data structure indicative of thedistribution of geometry for the scene to be rendered to determinegeometry for the scene that may be intersected by a ray being used for aray tracing operation, the ray tracing acceleration data structurecomprising a plurality of nodes, each node associated with a respectiveone or more volumes within the scene, the ray tracing process comprisingperforming for a plurality of rays a traversal of the ray tracingacceleration data structure to determine, by testing the rays forintersection with the volumes represented by the nodes of theacceleration data structure, geometry for the scene to be rendered thatmay be intersected by the rays;

the graphics processor comprising:

a programmable execution unit operable to execute programs to performgraphics processing operations, and in which a program can be executedby groups of plural execution threads together; and the graphicsprocessing system further comprising:

a processing circuit that is configured to:

include in a program to perform a ray tracing acceleration datastructure traversal, wherein the program is to be executed by respectivegroups of plural execution threads together, with individual executionthreads in a group of execution threads performing a traversal operationfor a respective ray in a corresponding group of rays such that thegroup of rays performs the traversal operation together, a set of one ormore ‘ray-volume testing’ instructions for testing rays for intersectionwith the one or more volumes associated with a given node of the raytracing acceleration data structure that is to be tested during thetraversal operation, which set of ‘ray-volume testing’ instructions,when executed by execution threads of a group of plural executionthreads, will cause: the graphics processor to test one or more raysfrom the group of plural rays that are performing the traversaloperation together for intersection with the one or more volumesassociated with the node being tested; and a result of the intersectiontesting to be returned for the node for the traversal operation;

wherein the execution unit is configured such that, when a group ofexecution threads is executing the program for a corresponding group ofrays that are performing a traversal of the ray tracing accelerationdata structure together, in response to the execution threads executingthe set of one or more ‘ray-volume testing’ instructions in respect of anode of the ray tracing acceleration data structure:

the execution unit triggers testing of one or more rays from the groupof plural rays that are performing the traversal of the ray tracingacceleration data structure together for intersection with the one ormore volumes associated with the node being tested, wherein a result ofthe intersection testing is then returned for the node for the traversaloperation.

The technology described herein broadly relates to the performing of raytracing on a graphics processor in order to render a frame thatrepresents a view of a particular scene. When performing a ray tracingoperation, for each ray that is being used to render a sampling positionin the frame that is being rendered, in order to render the samplingposition, it first needs to be determined which geometry that is definedfor the scene is intersected by the ray (if any).

There are various ways in which this can be done, as desired. However,in general, there may be many millions of graphics primitives within agiven scene, and millions of rays to be tested, such that it is notnormally practical to test every ray against each and every graphicsprimitive. To speed up the ray tracing operation the technologydescribed herein therefore uses a ray tracing acceleration datastructure, such as a bounding volume hierarchy (BVH), that isrepresentative of the distribution of the geometry in the scene that isto be rendered to determine the intersection of rays with geometry (e.g.objects) in the scene being rendered (and then renders samplingpositions in the output rendered frame representing the sceneaccordingly).

The ray tracing operation according to the technology described hereintherefore generally comprises performing a traversal of the ray tracingacceleration data structure for a plurality of rays that are being usedfor the ray tracing process, which traversal involves testing the raysfor intersection with the volumes represented by the different nodes ofthe ray tracing acceleration data structure in order to determine withreference to the node volumes which geometry may be intersected by whichrays for a sampling position in the frame for the scene that is beingrendered, and which geometry therefore needs to be further processed forthe rays for the sampling position.

The ray tracing acceleration data structure traversal operationtherefore involves traversing the nodes of the ray tracing accelerationdata structure, testing rays for intersection with the volumesassociated with the nodes, and maintaining a record of which nodevolumes are intersected by which rays, e.g. to determine which nodesshould therefore be tested next for the ray, and so on, down to the endnodes, e.g., at the lowest level, of the ray tracing acceleration datastructure.

For example, and in an embodiment, the ray tracing acceleration datastructure comprises a tree structure that is configured such that eachleaf node of the tree structure represents a set of primitives definedwithin the respective volume that the leaf node corresponds to, and withthe non-leaf nodes representing hierarchically-arranged larger volumesup to a root node at the top level of the tree structure that representsan overall volume for the scene in question that the tree structurecorresponds to. Each non-leaf node is therefore in an embodiment aparent node for a respective set of plural child nodes with the parentnode volume encompassing the volumes of its respective child nodes. Inan embodiment, each (non-leaf) node is therefore associated with arespective plurality of child node volumes, each representing a (in anembodiment non-overlapping) sub-volume within the overall volumerepresented by the node in question.

In that case, the ray tracing acceleration data structure can thus be(and in an embodiment is) traversed by proceeding down the branches ofthe tree structure and testing the rays against the child volumesassociated with a node at a first level of the tree structure to therebydetermine which child nodes in the next level of the tree structureshould be tested, and so on, down to the level of the respective leafnodes at the end of the branches of the tree structure.

That is, in embodiments, where the ray tracing acceleration datastructure comprises a tree structure of this type, the testing of theone or more rays against the volume associated with a node comprisestesting the rays against each of the child node volumes associated withthe node, and outputting a result of the testing for each of the childnodes of the node in question.

(For ease of explanation various embodiments will accordingly bedescribed wherein the ray-volume intersection testing for a nodeinvolves testing one or more rays against each of the child node volumesof the node in question. However, it will be appreciated that otherarrangements would be possible, e.g. depending on how the ray tracingacceleration data structure is configured and the one or more volumesthat are tested for a given node may be any suitable volumes associatedwith the node such as the volume represented by the node itself ratherthan the volumes represented by its child nodes.)

Once it has been determined by performing such a traversal operation fora ray which end nodes represent geometry that may be intersected by aray, the actual geometry intersections for the ray for the geometry thatoccupies the volumes associated with the intersected end nodes can bedetermined accordingly, e.g. by testing the ray for intersection withthe individual units of geometry (primitives) defined for the scene thatoccupy the volumes associated with the leaf nodes. Once the geometryintersections for the rays being used to render a sampling position havebeen determined, it can then be (and is) determined what appearance thesampling position should have, and the sampling position renderedaccordingly.

The ray-primitive intersection testing is generally relatively morecomputationally expensive. The use of the ray tracing acceleration datastructure in the technology described herein therefore allows the raytracing operation to be accelerated.

For instance, rather than testing a ray against each and everyindividual primitive within the scene, a ray that is being used for theray tracing process can instead be tested for intersection at a higherlevel against the volumes represented at each level of the tree datastructure, and for any rays that do not intersect a given node in aparticular branch of the tree structure it can be determined that theray does not intersect the geometry falling within the branch of thetree structure including that node, without further testing of the rayagainst the geometry in the lower levels of the tree.

The use of such a ray tracing acceleration data structure in thetechnology described herein can therefore be effective in speeding upthe overall ray tracing operation.

However, the Applicants have recognised that there is still scope forimprovement in this regard. In particular, the technology describedherein recognises that when performing a traversal of the ray tracingacceleration data structure for a ray, in typical cases, the ray tracingacceleration data structure is, or at least can be, configured such thatthe majority of the processing time is spent performing the intersectiontesting with the volumes of the nodes of the ray tracing accelerationdata structure, and that this ray-volume testing may therefore requirethe bulk of the processing effort.

Furthermore, loading in the required data for each of the rays beingused for the ray tracing process for performing such ray-volumeintersection testing can involve relatively high memory bandwidth.

The technology described herein therefore aims to improve the overallefficiency of the ray-volume intersection testing operations that arerequired to be performed when performing a traversal of the ray tracingacceleration data structure during the ray tracing operation.

This is achieved in the technology described herein by having a group ofplural rays that are being used for the ray tracing process perform atraversal of the ray tracing acceleration data structure together, aspart of a single traversal operation.

This has the effect and benefit that multiple rays in the group ofplural rays that are performing the traversal at the same time can thenbe tested against a given node of the ray tracing acceleration datastructure in one processing instance, thus reducing the number of memoryaccess operations.

For instance, this means that where there are multiple rays in the groupof plural rays that should be tested for a given node, all of those rayscan be tested against the node in a single testing instance.Correspondingly this then means that the graphics processor is able toload in all of the data for those rays from memory in one go, e.g. in asingle memory load operation.

Likewise, the result of the intersection testing can be returned for allof the rays (and all of the volumes) being tested, and then storedaccordingly.

In this way it is possible to reduce the overall number of memory accessoperations that may be required for testing all of the rays, e.g. atleast compared to other possible arrangements where the traversal andray-volume testing is performed in respect of individual rays, and whicharrangements may therefore require a significant number of memoryaccesses for loading in (and subsequently writing out) the required datafor each of the rays for each instance of intersection testing.

This also therefore allows processing resource for the group of rays tobe shared. For instance, and in an embodiment, by performing thetraversal operation for a group of rays together, the traversaloperation can thus be managed using a single, common data structure (a‘traversal record’) for the group of rays that tracks which nodes areassociated with geometry that is potentially intersected by the rays inthe group of rays (and which nodes/geometry thus needs to be tested(next) for the traversal operation).

Again, this can further reduce memory bandwidth since there is inembodiments a single data structure (e.g. a common record of which nodes(geometry) are potentially intersected by rays in the group of rays,which record may generally take any suitable form but is in anembodiment is in the form of a ‘stack’) that manages the traversaloperation for the whole group of rays, thus reducing the number ofmemory accesses, e.g. compared to other possible arrangements whereinrespective data structures (e.g. stacks) are provided for the individualrays, e.g. as may be required if the rays are performing the traversaloperation independently.

This can also in embodiments facilitate managing the traversal operationvia local registers, again reducing the need to access external memory.For instance, and in embodiments, the traversal operation can be managedvia a set of common registers allocated to the execution thread group inwhich the shared traversal record (the traversal data structure, e.g.stack) for the group of rays is maintained, as will be explained furtherbelow.

The approach according to the technology described herein can thereforeprovide various benefits especially in terms of reducing memorybandwidth.

Thus, in the technology described herein, the ray tracing traversaloperation is performed for a group of plural rays together. Inparticular, the ray tracing traversal program is executed by a group ofplural execution threads, with each ray in the group of plural raysbeing processed by a corresponding execution thread in a group of pluralexecution threads that are executing the program at the same time.

Furthermore, each of the execution threads (for each of the rays in thegroup) are in an embodiment arranged to continue executing the programfor the traversal operation (i.e. so that all of the threads in theexecution thread group are, and remain, in an ‘active’ state) until thegroup of rays as a whole has completed the traversal of the ray tracingacceleration data structure.

In other words, in the technology described herein, the whole executionthread group for the group of plural rays is in an embodiment kept‘active’ such that all of the rays in the group of plural rayseffectively perform the traversal operation together, as a group, evenif it is determined that a given ray in the group of plural rays doesnot intersect any geometry in the ray tracing acceleration datastructure (such that the execution thread processing that ray could inprinciple be retired/terminated from that point). To facilitate this,one or more instructions are included in the program that cause theexecution threads to be and remain in the active state, at least untilall of the rays in the group of rays that are performing the traversaloperation together have finished the traversal, e.g. at least until ithas been determined for all of the rays which geometry may beintersected by the rays.

Thus, in embodiments, the method comprises (and the processing circuitof the graphics processing system is configured to perform a step of)including in a program to perform a ray tracing acceleration datastructure traversal, wherein the program is to be executed by a group ofplural execution threads, with individual execution threads in the groupof execution threads performing a traversal operation for a respectiveray in a corresponding group of rays, a set of one or more instructionsthat cause the execution threads in the group of execution threads to bein an active state at least until the traversal operation to determinewhich, if any, geometry for the scene may be intersected by the rays isfinished for all of the rays in the group of rays being processed by thegroup of execution threads. In this way, the group of rays is caused toperform the (entire) traversal operation together.

This could be achieved in any suitable manner, as desired. For instance,the program instructions may be arranged such that it is ensured thatthe threads are not caused to diverge, or be terminated, and thus remainin the active state for the traversal operation. Or, and in someembodiments, explicit instructions, or instruction modifiers, may beincluded into the program that when executed force all of the executionthreads in the execution thread group to be in the active state (e.g.such that if an execution thread in the execution thread group hadterminated, the thread can be brought back into the active state for thetraversal operation). Various arrangements would be possible in thatregard.

When a group of execution threads are in the active state, processingoperations can then be performed using the group of threads as a whole,e.g. with processing resource shared across the thread groups. Forexample, all of the execution threads may be arranged to perform thesame processing operation, but for different data points, e.g., in asingle instruction, multiple data (SIMD) execution state.

Ensuring that all of the threads that are performing the traversaloperation for a respective ray in the group of rays are in an activestate may help to simplify the implementation of the traversal operationand has the additional benefit that the traversal operation can then be,and in an embodiment is, managed for the group of plural rays as awhole, e.g. using a shared data structure.

Thus, in the technology described herein, all of the execution threadsfor the rays in the group of plural rays in an embodiment remain activeat least until all of the required ray-volume intersection testing forthe rays in the group of plural rays has been performed to determinewhich geometry may be intersected by the ray (or, correspondingly, if itis determined during the traversal that none of the rays intersect anyof the geometry, such that there is a ‘miss’, the traversal operation isthen finished once it has been determine that none of the rays in thegroup intersect any geometry).

In some embodiments the execution threads for all of the rays in thegroup of plural rays remain active beyond the initial traversaloperation to determine which geometry may be intersected by the rays,e.g., and in an embodiment, also remain active for the subsequenttesting of the rays against the individual units of geometry(primitives) contained within the volumes that it is determined that therays intersect.

That is, in embodiments, both the ray-volume intersection testing thatis performed during the traversal operation to determine which geometrymay be intersected and the subsequent ray-primitive intersection testingthat is performed to determine the actual geometry intersections isperformed for a group of plural rays as a whole.

In an embodiment both of these operations are triggered by the sameshader program. For instance, the program is in an embodiment executedby a group of plural execution threads for a corresponding group of raysthat are thereby arranged to perform the traversal of the ray tracingacceleration data structure together to determine a set of volumes (e.g.the leaf node volumes of a tree structure) representing respectivesubsets of geometry that may be intersected by at least one ray in thegroup of plural rays and also causes the graphics processor to performthe required ray-primitive intersection testing for the geometryoccupying those volumes to determine the actual geometry intersections(if any).

These steps may be performed separately, e.g. in sequence, e.g. suchthat it is first determined which subsets of geometry (e.g. which endnodes of the ray tracing acceleration data structure) may beintersected, and then only after all of the traversals for all of therays in the group of rays performing the traversal to determine whichend nodes represent volumes containing geometry that may be intersectedby a ray in the group of plural rays have finished, is the graphicsprocessor then caused to perform the required ray-primitive intersectiontesting for the rays in respect of those nodes.

However, it would also be possible to perform these steps in parallel,as part of the same overall traversal operation, and this is done inembodiments. For instance, as part of the program execution, during atraversal of the ray tracing acceleration data structure, depending onwhether a given node to be tested corresponds to a parent node in theray tracing acceleration data structure or an end (leaf) node at thelowest level of the ray tracing acceleration data structure, thegraphics processor may then perform the required ray-volume and/orray-primitive intersection testing accordingly.

For example, in response the traversal operation reaching a parent node,the graphics processor may then be caused to test the rays in the groupof plural rays against the respective one or more child volumesassociated with that node to determine which child nodes should betested against (the ray-volume intersection testing). On the other hand,in response to the traversal operation reaching an end node of the raytracing acceleration data structure, the graphics processor may thentest the rays for intersection with the individual units of geometry(primitives) represented by the end node in question (the ray-primitiveintersection testing).

Thus, in embodiments, the ray-primitive testing in respect of aparticular end node for one (a first) branch of the tree may beperformed before the traversal operation has fully worked through all ofthe branches of the tree structure to determine which other, if any, endnodes may be intersected by the rays.

For instance, the traversal operation may be performed such that thetraversal in an embodiment works down a first branch of the treestructure to determine whether there is any geometry for the branch thatis potentially intersected and then proceeds to determine the actualgeometry intersections for that branch, before moving to the next (i.e.the adjacent) branch, and so on.

Thus, the traversal operation that is performed for a group of pluralrays according to the technology described herein in an embodimentcomprises both the ray-volume testing to determine which geometry (e.g.which leaf node volumes) may be intersected by the rays in the group ofplural rays for which the traversal is being performed and theray-primitive intersection testing to determine the actual geometryintersections.

This may help simplify the traversal operation, e.g. by allowing bothoperations to be managed using the same data structure (e.g. the sametraversal record), thus avoiding the need to store out any data inbetween the ray-volume and ray-primitive testing operations.

However, this is not necessary, and it would also be possible for thegroup of rays to perform the traversal operation up to and including thestep of determining, by performing suitable ray-volume intersectiontesting, which geometry may be intersected by the rays in the group ofrays as a whole, together, but for the rays to, e.g., then be re-groupedfor the ray-primitive intersection testing, or, e.g., for theray-primitive testing to be performed instead for individual rays.

To further facilitate improving the efficiency of the ray-volumeintersection testing that is to be performed during a traversaloperation, the technology described herein also provides a dedicated setof one or more ‘ray-volume testing’ instructions that can be includedwithin a shader program for the ray tracing operation, and which set ofinstructions cause the graphics processor to perform the requiredintersection testing between one or more rays within the group of pluralrays that are performing the traversal together and the volumesassociated with a node of the ray tracing acceleration data structurefor the rays for which the shader program is being executed, e.g. asabove.

The use of an instruction for performing the ray-volume intersectiontesting may be beneficial in itself in terms of improving the efficiencyof the program compilation/execution, e.g. as there may generally be ahuge number of such tests that need to be performed, such that the useof a dedicated instruction that can be included into the program totrigger such testing can help reduce the complexity of the shaderprogram. Thus, whenever ray-volume intersection testing is required inrespect of a node of the ray tracing acceleration data structure, theinstruction can be included appropriately into the generated program tocause the graphics processor to perform the required ray-volumeintersection testing.

In response to executing such ray-volume intersection testinginstruction, the relevant input data defining the rays to be tested canthen be loaded in, together with data indicative of the volumesassociated with the node that is being tested against, and theray-volume intersection testing can then be performed accordingly.

Subject to the requirements of the technology described herein, theray-volume intersection testing itself can generally be performed in anysuitable way, as desired, e.g. in the normal way for such ray tracingoperations and for the graphics processor and graphics processing systemin question.

The ray-volume intersection testing thus in an embodiment takes as inputa node of the ray tracing acceleration data structure that is desired tobe tested (which node will typically be a parent or non-leaf node)against and then tests one or more rays in the group of plural rays thatare performing the traversal together for intersection against one ormore volumes associated with the node being tested.

For example, as mentioned above, in the case of a tree structure, therays may be, and in an embodiment are, tested for intersection againsteach of the child node volumes of the node in question. This means thatwhen the ray-volume intersection testing instruction is executed, thegraphics processor can then, and in an embodiment does, test multiplerays from the group of plural rays against the volumes for plural childnodes of the node in question, in one processing instance (e.g., and inan embodiment, with one load operation for loading all of the data forthis testing and a single output for all of the testing). For instance,and for this reason, in embodiments, the ray tracing acceleration datastructure comprises a relatively ‘wide’ tree structure. In anembodiment, each parent node in the tree may have up to six respectivechild nodes. In such cases the testing is in an embodiment performediteratively for each ray for each child node volume until all of therays in the group of rays have been tested against each child nodevolume. The result of the ray-volume intersection testing is thenreturned appropriately for the program performing the traversaloperation to continue.

In this way, by being able to test multiple rays in a single testingoperation, in an embodiment against multiple (child node) volumes, it istherefore possible to improve the overall speed of the ray-volumetesting, at least on average.

On the other hand, by having all of the execution threads remain in anactive state during the traversal, such that the plural rays perform thetraversal as a group, this may then mean that some of the rays in thegroup of plural rays that perform the traversal do not intersect with agiven node that is being tested, and it may even be the case that someof the rays in the group do not intersect with any of geometry for thescene for which the ray tracing acceleration data structure is defined(such that the corresponding execution threads processing those rayscould in principle be retired at the point that is determined).

Maintaining all of the execution threads (for all of the rays) in thegroup in an active state such that the whole group of rays performs thetraversal together may therefore result in some unnecessary processingin carrying those rays through the traversal operation.

However, in this respect the technology described herein furtherrecognises that many rays will need to be tested for intersection withthe same, or similar, volumes, at least in the higher levels of the raytracing acceleration data structure.

For example, it may be expected that, or the graphics processor may bearranged to group plural rays such that, rays that are processedtogether as a group by a corresponding group of execution threads willshow a high degree of ‘coherency’, e.g., such that they are expected tointersect similar geometry. Various arrangements would be possible inthis regard for pooling rays and selecting rays from the pool as asuitable group of plural rays, as will be explained further below.

Thus, arranging for a group of plural rays to traverse the ray tracingacceleration data structure together can generally work well to providean overall more efficient ray tracing operation.

That is, it has been found that in many ray tracing operations, thebenefits of performing the traversal for a whole group's worth of raystogether in terms of allowing for more rays to be tested in a singleoperation, with reduced memory bandwidth, typically outweigh the cost ofany unnecessary processing that may be performed in order to keep all ofthe execution threads active during the traversal.

Moreover, the Applicants have recognised that performing the traversaloperation for a group of rays as a whole, such that the correspondingexecution thread group for the group of rays as a whole remains activecan in embodiments provide yet further improvements in terms of reducingmemory bandwidth.

In particular, by keeping all of the execution threads in the executionthread group that is processing the group of plural rays active in thisway, it is then possible to manage the traversal operation for the groupof rays as a whole, e.g., and in an embodiment, using a set of shared(e.g. general purpose) registers allocated for the execution threadgroup.

For instance, as mentioned above, the traversal operation itself may be,and in an embodiment is, managed using an appropriate record, e.g. withthe traversal record comprising a list of entries indicating which nodeshave been determined to be intersected by the rays in the group ofplural rays that are performing the traversal.

For example, and in an embodiment, in order to track which nodes areintersected by which rays, and therefore need to be tested against theray, whenever it is determined that a ray that is being used for the raytracing process intersects a given node, an indication that the node(volume) is intersected is then pushed (added) to a suitable record forthe traversal operation. The record of which nodes represent volumesthat contain geometry that might be intersected by a ray can then beread to determine which nodes need to be tested at the next level, andso on.

In an embodiment, the record for the traversal operation is provided inthe form of a suitable traversal ‘stack’, and is in an embodimentmanaged using a ‘last-in-first-out’ scheme, e.g. in the normal way for astack.

Thus, when the testing of a first parent node indicates that one or moreof its child nodes are intersected, the child nodes are then pushed(added) to the stack, and then popped out (removed) for testingaccordingly (such that the child nodes of the first parent node aretested before testing any other parent nodes at the same level as thefirst parent node). However, various other arrangements would bepossible for tracking which nodes are to be tested and the traversalrecord may in general be arranged and managed in any suitable manner, asdesired.

As explained above, because in the technology described herein thetraversal of the ray tracing acceleration data structure is performedfor a plurality of rays together, such that all of the execution threadsprocessing the rays in the group of plural rays remain ‘active’ duringthe traversal, this means that a single, common traversal record can bemaintained for the group of rays as a whole. Furthermore, because theexecution thread group as a whole is in an embodiment kept active, it ispossible, and it is a benefit of the technology described herein, thatthe traversal record can be managed entirely using a set of registersthat have been allocated for the group of plural execution threads, suchthat the traversal operation for all of the rays being processed by thegroup of plural execution threads can be handled using registers.

This can therefore further reduce memory bandwidth requirements, e.g.compared to maintaining a separate record for each ray. For instance,rather than each ray having its own traversal record that is maintainedin external memory, and the graphics processor having to always writeout the result of the intersection testing for each ray to itsrespective (per-ray) record in memory, as may otherwise be done, inembodiments of the technology described herein, because the traversal isperformed for a group of rays as a whole, the result of the intersectiontesting for the rays in the group can be (and in an embodiment is)written out to a shared traversal record that is in an embodimentmaintained in a set of local registers that have been allocated for theexecution thread group that is processing the group of plural raysperforming the traversal, such that the traversal record is managed forthe whole group of plural rays, e.g. rather than having to repeatedlyread in respective records for the individual rays from memory, thusreducing memory bandwidth.

In this way the traversal stack can in an embodiment be managed(entirely) via the registers, in an embodiment on-chip, with the stackonly being written out from the registers to memory in the event ofoverflow.

Thus, in embodiments, the traversal stack is managed, in an embodimententirely, using the allocated registers for the thread group. In thisway, where it is possible to do so, the state of the traversal recordcan be held entirely locally to the graphics processor, thus reducingmemory bandwidth. The size of the data structures can be designed to tryto ensure this is the case, at least in normal operation. In that case,the output of the ray-volume intersection testing (i.e. the node(volumes) that are intersected by a ray and therefore need to be testedfor the traversal operation) is in an embodiment pushed to the traversalrecord without being written to memory.

In an embodiment, the output of the intersection testing is only writtento memory when writing a result of the intersection testing to therecord would cause an overflow of the record. For instance, thetraversal record generally has a finite number of entries (e.g., andespecially where it is managed using the registers, which typically havea fixed size). For example, in an exemplary embodiment, the traversalrecord may be 8 entries deep.

Thus, if a particular instance of ray-volume testing results in a largenumber of intersections, writing a result for each of the determinedintersections may cause the traversal record to overflow. In that case,the entire traversal record is in an embodiment copied to the overflowoutput, e.g. and then written out to memory in its current form. Asuitable indicator of the overflow state is then in an embodimentoutputted and included into the traversal record, such that the entriesthat were written out to memory because of the overflow can be loadedback in for testing to allow the entries into the stack. Thus, in theevent of a record overflow, the current record is written to memory andthen cleared, and an indication that this has happened is included intothe record.

For instance, and in an embodiment, the overflow state is also (always)returned as output for the ray-volume intersection testing. Because therecord is in an embodiment written out as a whole, the overflow outputwill always be either zero or not zero so it is easy to detect whenoverflow has occurred (e.g. rather than trying to identify which pushescaused overflow, etc.).

In the technology described herein, the traversal record is thus in anembodiment maintained locally via the registers, without necessarilyhaving to access external memory other than in overflow situations.

Thus, in embodiments, when it is desired to perform intersection testingfor one or more rays in the group of plural rays that are performing thetraversal operation together in one go (using a single memory loadoperation), the data for those rays can be loaded in one go, and theresult of the intersection testing can then be outputted to thetraversal record that is being maintained via the registers for theexecution thread group as a whole (without necessarily having to writeout to memory other than in overflow situations).

In this way the efficiency of the ray-volume intersection testingoperations, in particular by reducing the memory bandwidth requirementsfor performing the ray-volume intersection testing and for managing thetraversal, can be improved.

The effect of all of this then is that the overall ray tracing renderingprocess can be performed more efficiently, thereby facilitating, forexample, performing ray tracing and/or improved ray tracing, e.g. ondevices whose processing resources may be more limited.

The technology described herein may therefore provide variousimprovements compared to other possible approaches.

The technology described herein relates to the situation where a framethat represents a view of a scene comprising one or more objects isbeing rendered using a ray tracing process.

In this process, the frame that is being rendered will, and in anembodiment does, comprise an array of sampling positions, and a raytracing process will be used to render each of the sampling positions soas to provide an output frame (an image) that represents the desiredview of the scene (with respective rays that are cast corresponding toand being used when rendering and to render respective samplingpositions for the frame).

The technology described herein can be used for any form of ray tracingbased rendering. Thus, for example, the technology described herein canbe used for and when a “full” ray tracing process is being used torender a scene, i.e. in which so-called “primary” rays are cast from aview point (the camera) through a sampling position in the image frameto determine the intersection of that ray with objects in the scene,e.g., and in an embodiment, to determine, for each ray, a closest objectin a scene that the ray intersects (a “first intersection point” of theray). The process may involve casting further (secondary) rays from therespective first intersection points of primary rays with objects in thescene, and additionally using the intersection data for the secondaryrays in determining the rendering of the sampling positions.

In this case, the operation in the manner of the technology describedherein may be, and is in an embodiment, used when and for analysing theintersections of both primary and secondary rays with objects in thescene.

The technology described herein can also be used for so-called “hybrid”ray tracing rendering processes, e.g. in which both ray tracing andrasterisation processes are performed when performing rendering (e.g. inwhich only some of the steps of a full ray tracing process areperformed, with a rasterisation process or processes being used toimplement other steps of the “full” ray tracing process). For example,in an exemplary hybrid ray tracing process, the first intersection ofeach of the primary rays with objects in the scene may be determinedusing a rasterisation process, but with the casting of one or morefurther (secondary) rays from the determined respective firstintersection points of primary rays with objects in the scene then beingperformed using a ray tracing process.

In this case, the operation in the manner of the technology describedherein may be, and is in an embodiment, used when and for analysing theintersections of the secondary rays with objects in the scene.

The ray-tracing based rendering of a frame that is performed in thetechnology described herein is triggered and performed by theprogrammable execution unit of the graphics processor executing agraphics processing program that will cause (and that causes) theprogrammable execution unit to perform the necessary ray tracingrendering process.

Thus, a graphics shader program or programs, including a set (sequence)of program instructions that when executed will perform the desired raytracing rendering process, will be issued to the graphics processor andexecuted by the programmable execution unit. The shader program(s) mayinclude only instructions necessary for performing the particular raytracing based rendering operations, or it may also include otherinstructions, e.g. to perform other shading operations, if desired.

Subject to the particular operation in the manner of the technologydescribed herein, the execution of the shader program to perform thedesired ray tracing process can otherwise be performed in any suitableand desired manner, such as, and in an embodiment, in accordance withthe execution of shader programs in the graphics processor and graphicsprocessing system in question.

Thus, the graphics processor (the programmable execution unit of thegraphics processor) will operate to execute the shader program(s) thatincludes a sequence of instructions to perform the desired ray tracingrendering process, for plural, and in an embodiment for each, samplingposition, of the frame that is to be rendered.

Correspondingly, when executing the ray tracing shader program, thegraphics processor will operate to spawn (issue) respective executionthreads for the sampling positions of the frame being rendered, witheach thread then executing the program(s) so as to render the samplingposition that the thread represents (and corresponds to). The graphicsprocessor accordingly in an embodiment comprises a thread spawner (athread spawning circuit) operable to, and configured to, spawn (issue)execution threads for execution by the programmable execution unit.

The ray tracing rendering shader program(s) that is executed by theprogrammable execution unit can be prepared and generated in anysuitable and desired manner.

In an embodiment, it or they is generated by a compiler (the shadercompiler) for the graphics processor of the graphics processing systemin question (and thus the processing circuit that generates the shadingprogram in an embodiment comprises an appropriate compiler circuit). Thecompiler is in an embodiment executed on an appropriate programmableprocessing circuit of the graphics processing system.

In a graphics processing system that is operable in the manner of thetechnology described herein, in embodiments of the technology describedherein at least, a compiler, e.g. executing on a host processor, willgenerate and issue to the graphics processor one or more shader programsthat when executed will perform the required ray tracing-based renderingoperations in accordance with the technology described herein, with thegraphics processor (the programmable execution unit of the graphicsprocessor) then executing the programs to perform the ray tracing-basedrendering, and as part of that program execution exchanging the messagesdiscussed above with the ray tracing acceleration data structuretraversal circuit of the graphics processor.

The operation of the technology described herein can be (and is)implemented and triggered by including appropriate ‘ray-volume’intersection testing instructions in the ray tracing rendering shaderprogram to be executed by the programmable execution unit that willtrigger the desired ray-volume intersection testing to be performed,e.g., and in embodiments, by triggering the execution unit to send anappropriate message to the intersection testing circuit (with theexecution unit then sending the message when it reaches (executes) therelevant instruction in the shader program). (Appropriate instructionsfor causing the execution threads to be in the active state, and alsofor performing the ray-primitive testing, at least where this istriggered by the same shader program, can also be included appropriatelyinto the shader program).

Such instructions can be included in a shader program to be executed bythe programmable execution unit in any suitable and desired manner andby any suitable and desired element of the overall data (graphics)processing system.

For instance, in an embodiment, the “ray-volume” intersection testinginstruction is included in the shader program by the compiler (theshader compiler) for the graphics processor. Thus the compiler in anembodiment inserts a “ray-volume” intersection testing instruction atthe appropriate point in the ray tracing rendering shader program thatis performing the ray tracing.

In an embodiment, a “ray-volume” intersection testing is included in theray tracing rendering shader program that is to be executed by thegraphics processor by the compiler in response to an appropriate raytracing indication (e.g. a “trace( )” call), included in the (highlevel) shader program that is provided by the application that requiresthe graphics processing. Thus, e.g., and in an embodiment, anapplication program will be able to include an explicit indication of aneed for a ray-volume intersection testing instruction in respect of anode during the ray tracing operation, with the compiler then, in thetechnology described herein, including an appropriate “ray-volume”intersection testing instruction in the compiled shader program inresponse to that. It may also be possible for the compiler to include“ray-volume” intersection testing instruction of its own accord, e.g. inthe case where the compiler is able to assess the shader program beingcompiled to identify when and where to include a “ray-volume”intersection testing instruction or instructions, even in the absence ofan explicit indication of that.

In an embodiment, the compiler analyses the shader program code that isprovided, e.g. by the application on the host processor that requiresthe graphics processing, and includes a “ray-volume” intersectiontesting instruction or instructions at the appropriate point(s) in theshader program (e.g. by inserting the instruction(s) in the (compiled)shader program).

The technology described herein also extends to and includes suchoperation of a compiler.

Thus, a further embodiment of the technology described herein comprisesa method of compiling a shader program to be executed by a programmableexecution unit of a graphics processor that is operable to executegraphics processing programs to perform graphics processing operations;

the method comprising:

-   -   for a shader program to be executed by a programmable execution        unit of a graphics processor when rendering a frame that        represents a view of a scene comprising one or more objects        using a ray tracing process,    -   wherein the ray tracing process uses a ray tracing acceleration        data structure indicative of the distribution of geometry for        the scene to be rendered to determine geometry for the scene        that may be intersected by a ray being used for a ray tracing        operation, the ray tracing acceleration data structure        comprising a plurality of nodes, each node associated with a        respective one or more volumes within the scene,    -   the ray tracing process comprising performing for a plurality of        rays a traversal of the ray tracing acceleration data structure        to determine, by testing the rays for intersection with the        volumes represented by the nodes of the acceleration data        structure, geometry for the scene to be rendered that may be        intersected by the rays, and    -   wherein the program is to be executed by a group of plural        execution threads, with individual execution threads in the        group of execution threads performing a traversal operation for        a respective ray in a corresponding group of rays:    -   including in the shader program a set of one or more ‘ray-volume        testing’ instructions for testing rays for intersection with the        one or more volumes associated with a given node of the ray        tracing acceleration data structure that is to be tested during        the traversal operation, which set of ‘ray-volume testing’        instructions, when executed by execution threads of the group of        plural execution threads, will cause:        -   the graphics processor to test one or more rays from the            group of plural rays that are performing the traversal            operation together for intersection with the one or more            volumes associated with the node being tested; and        -   a result of the intersection testing to be returned for the            node for the traversal operation.

A further embodiment of the technology described herein comprises acompiler for compiling a shader program to be executed by a programmableexecution unit of a graphics processor that is operable to executegraphics processing programs to perform graphics processing operations;

the compiler comprising a processing circuit configured to:

-   -   for a shader program to be executed by a programmable execution        unit of a graphics processor when rendering a frame that        represents a view of a scene comprising one or more objects        using a ray tracing process,    -   wherein the ray tracing process uses a ray tracing acceleration        data structure indicative of the distribution of geometry for        the scene to be rendered to determine geometry for the scene        that may be intersected by a ray being used for a ray tracing        operation, the ray tracing acceleration data structure        comprising a plurality of nodes, each node associated with a        respective one or more volumes within the scene,    -   the ray tracing process comprising performing for a plurality of        rays a traversal of the ray tracing acceleration data structure        to determine, by testing the rays for intersection with the        volumes represented by the nodes of the acceleration data        structure, geometry for the scene to be rendered that may be        intersected by the rays, and wherein the program is to be        executed by groups of plural execution threads, with individual        execution threads in the group of execution threads performing a        traversal operation for a respective ray in a corresponding        group of rays:    -   include in the shader program:        -   a set of one or more ‘ray-volume testing’ instructions for            testing rays for intersection with the one or more volumes            associated with a given node of the ray tracing acceleration            data structure that is to be tested during the traversal            operation, which set of ray-volume testing instructions,            when executed by execution threads of the group of plural            execution threads, will cause:        -   the graphics processor to test one or more rays from the            group of plural rays that are performing the traversal            operation together for intersection with the one or more            volumes associated with the node being tested; and        -   a result of the intersection testing to be returned for the            node for the traversal operation.

In an embodiment the compiler also includes in the program a set of oneor more instructions that cause the execution threads in the group ofexecution threads to be in an active state at least until the traversaloperation to determine which, if any, geometry for the scene may beintersected by the rays is finished for all of the rays in the group ofrays being processed by the group of execution threads, such that thegroup of rays performs the traversal operation together, as describedabove.

The compiler (the compiler processing circuit) is in an embodiment partof, and in an embodiment executes on, a central processing unit (CPU),such as a host processor, of the graphics processing system, and is inan embodiment part of a driver for the graphics processor that isexecuting on the CPU (e.g. host processor).

In this case, the compiler and compiled code will run on separateprocessors within the overall graphics processing system. However, otherarrangements would be possible, such as the compiler running on the sameprocessor as the compiled code, if desired.

The compilation process (the compiler) can generate the ray tracingrendering shader program in any suitable and desired manner, e.g., andin an embodiment, using any suitable and desired compiler techniques forthat purpose.

Thus, in an embodiment, the shader program is generated by the compiler,and the compiler is arranged to include within the shader program theinstructions that are used in the technology described herein. Otherarrangements would, of course, be possible.

The generated shader program can then be issued to the programmableexecution unit of the graphics processor for execution thereby.

The technology described herein also extends to the operation of thegraphics processor itself when executing the shader program.

A further embodiment of the technology described herein comprises amethod of operating a graphics processor when rendering a frame thatrepresents a view of a scene comprising one or more objects using a raytracing process, wherein the ray tracing process uses a ray tracingacceleration data structure indicative of the distribution of geometryfor the scene to be rendered to determine geometry for the scene thatmay be intersected by a ray being used for a ray tracing operation, theray tracing acceleration data structure comprising a plurality of nodes,each node associated with a respective one or more volumes within thescene, the ray tracing process comprising performing for a plurality ofrays a traversal of the ray tracing acceleration data structure todetermine, by testing the rays for intersection with the volumesrepresented by the nodes of the acceleration data structure, geometryfor the scene to be rendered that may be intersected by the rays;

the graphics processor comprising a programmable execution unit operableto execute programs to perform graphics processing operations, and inwhich a program can be executed by groups of plural execution threadstogether;

the method comprising:

when a group of execution threads is executing a program to perform aray tracing acceleration data structure traversal, with individualexecution threads in the group of execution threads performing atraversal operation for a respective ray in a corresponding group ofrays such that the group of rays performing the traversal operationtogether, in response to the execution threads executing a set of one ormore ‘ray-volume testing’ instructions that are included in the programin respect of a node of the ray tracing acceleration data structure:

-   -   testing one or more rays from the group of plural rays that are        performing the traversal operation together for intersection        with the one or more volumes associated with the node being        tested; and    -   returning a result of the intersection testing for the node for        the traversal operation.

A yet further embodiment of the technology described herein comprises agraphics processor that is operable to render a frame that represents aview of a scene comprising one or more objects using a ray tracingprocess, wherein the ray tracing process uses a ray tracing accelerationdata structure indicative of the distribution of geometry for the sceneto be rendered to determine geometry for the scene that may beintersected by a ray being used for a ray tracing operation, the raytracing acceleration data structure comprising a plurality of nodes,each node associated with a respective one or more volumes within thescene, the ray tracing process comprising performing for a plurality ofrays a traversal of the ray tracing acceleration data structure todetermine, by testing the rays for intersection with the volumesrepresented by the nodes of the acceleration data structure, geometryfor the scene to be rendered that may be intersected by the rays;

the graphics processor comprising:

a programmable execution unit operable to execute programs to performgraphics processing operations, and in which a program can be executedby groups of plural execution threads together;

wherein the execution unit is configured such that, when a group ofexecution threads is executing a program to perform a ray tracingacceleration data structure traversal, with individual execution threadsin the group of execution threads performing a traversal operation for arespective ray in a corresponding group of rays that are therebyperforming the traversal operation together, in response to theexecution threads executing a set of one or more ‘ray-volume testing’instructions included in the program in respect of a node of the raytracing acceleration data structure:

the execution unit triggers testing of one or more rays from the groupof plural rays that are performing the traversal of the ray tracingacceleration data structure together for intersection with the one ormore volumes associated with the node being tested, wherein a result ofthe intersection testing is then returned for the node for the traversaloperation.

As will be appreciated by those skilled in the art, these additionalembodiments of the technology described herein relating to the operationof the compiler and/or the graphics processor can, and in an embodimentdo, include any one or more or all of the features of the technologydescribed herein described herein, as appropriate.

When executing the shader program to perform the ray tracing basedrendering process, as it is a ray tracing-based rendering process, theperformance of that process will include the tracing of rays into andthrough the scene being rendered, e.g., and in an embodiment, so as todetermine how a given sampling position that the ray or rays in questioncorrespond to should be rendered to display the required view of thescene at that sampling position.

The graphics processor can be any suitable and desired graphicsprocessor that includes a programmable execution unit (circuit) that canexecute program instructions.

The programmable execution unit can be any suitable and desiredprogrammable execution unit (circuit) that a graphics processor maycontain. It should be operable to execute graphics shading programs toperform graphics processing operations. Thus the programmable executionunit will receive graphics threads to be executed, and executeappropriate graphics shading programs for those threads to generate thedesired graphics output.

Once a thread has finished its respective processing operation, thethread can then be ‘retired’, e.g. and a new execution thread spawned inits place.

The graphics processor may comprise a single programmable executionunit, or may have plural execution units. Where there are a pluralexecution units, each execution unit can, and in an embodiment does,operate in the manner of the technology described herein. Where thereare plural execution units, each execution unit may be provided as aseparate circuit to other execution units of the data processor, or theexecution units may share some or all of their circuits (circuitelements).

The (and each) execution unit should, and in an embodiment does,comprise appropriate circuits (processing circuits/logic) for performingthe operations required of the execution unit.

According to the technology described herein the graphics processor andthe programmable execution unit are operable to execute shader programsfor groups (“warps”) of plural execution threads together, e.g. inlockstep, e.g., one instruction at a time. In that case, the executionthreads in the execution thread group in an embodiment perform the sametraversal operation, but for different rays, e.g., and in an embodiment,in a single instruction, multiple data (SIMD) execution state.

The groups of execution threads can therefore (and do) each process acorresponding group of plural of rays for the ray tracing operation.

According to the technology described herein, the graphics processor isthus configured to, and operable to, group rays (traversal requests)that are to traverse the same acceleration data structure together, soas to execute the traversals of the acceleration data structure for therays of the group of rays together.

The grouping may be performed in any suitable fashion as desired, but inan embodiment rays that are sufficiently similar to each other and thatare to traverse the same acceleration data structure are groupedtogether, so as to execute the traversals of the acceleration datastructure for the rays of the group together. This will help to increasememory locality, and, accordingly, improve the effectiveness of anycaching of the ray tracing acceleration data structure (andcorrespondingly reduce the number of off-chip memory accesses that maybe required).

In this case, the rays are in an embodiment grouped together based ontheir similarities to each other, such that “similar” rays will begrouped together for this purpose. Thus rays are in an embodimentgrouped for traversing the (same) ray tracing acceleration datastructure together based on one or more particular, in an embodimentselected, in an embodiment predefined criteria, such as one or more of,and in an embodiment all of: the starting positions (origins) for therays; the directions (direction vectors) of the rays; and the range thatthe rays are to be cast for.

Thus, in an embodiment, rays can be, and are, grouped together for theray tracing acceleration data structure traversal process if and whentheir positions (origins), directions, and/or ranges, are sufficientlysimilar (e.g., and in an embodiment, are within a particular thresholdrange or margin of each other) (and the rays are to traverse the sameray tracing acceleration data structure). This will then facilitateperforming the ray tracing acceleration data structure traversals forsimilar rays together, thereby increasing memory access locality, etc.,and thus making the ray tracing acceleration data structure traversaloperation more efficient.

In order to facilitate this operation, the graphics processor can in anembodiment maintain a “pool” of rays that are waiting to traverse anacceleration data structure (e.g. in an appropriate queue or buffer(cache) on or accessible to the graphics processor), and select groupsof one or more rays from that pool for processing, e.g., and in anembodiment, based on one or more or all of the criteria discussed above.A suitable execution thread group may then be spawned for the selectedgroup of rays, and a program executed to cause the group of rays toperform the traversal operation together. This will then facilitate theray tracing acceleration data structure traversal processing groups ofsimilar rays together.

The graphics processor correspondingly in an embodiment comprises anappropriate controller operable to select and group rays for which raytracing acceleration data structure traversals are to be performed fromthe “pool”, and to cause ray tracing acceleration data structuretraversals to be performed for groups of rays together.

In this case, rays that are in the “pool” and that are waiting totraverse a ray tracing acceleration data structure in an embodiment havetheir duration in the pool (their “ages”) tracked, with any ray whoseduration in the pool exceeds a particular, in an embodiment selected, inan embodiment predetermined, threshold duration (“age”), then beingprioritised for processing, e.g., and in an embodiment, without waitingany further for later, “similar” rays to arrive for processing. Thiswill then help to ensure that rays are not retained in the pool for toolong whilst waiting for other rays potentially to group with the ray.

The rays in the pool may, for example, be time-stamped for this purposeso that their ages in the pool can be tracked. Other arrangements would,of course, be possible. Once a group of rays to be processed togetherhave been selected, then the rays should be processed together as agroup, e.g. by spawning a suitable execution thread group, and causingthe execution thread group to execute a program that causes the pluralrays to traverse the ray tracing acceleration data structure together,in the manner described above.

The groups of rays for which the traversals of the ray tracingacceleration data structure are performed together can comprise anysuitable and desired (plural) number of rays, although there may, e.g.,and in an embodiment, be a particular, in an embodiment selected, in anembodiment defined, maximum number of rays for which the traversals maybe performed together, e.g. depending upon the parallel processingcapability of the ray tracing acceleration data structure traversalcircuit in this regard.

Other arrangements would, of course, be possible. Thus, in thetechnology described herein, the group of one or more execution threadscomprises plural execution threads, and corresponds to a thread group(warp) that is executing the program in lockstep. In an embodiment, thegroup of execution threads comprises more than two execution threads,such as four, eight or sixteen (or more, such as 32, 64 or 128)execution threads.

The ray tracing operation according to the technology described hereinis performed using a ray tracing acceleration data structure. The raytracing acceleration data structures that are used and traversed in thetechnology described herein can be any suitable and desired ray tracingacceleration data structures that are indicative of (that represent) thedistribution of geometry for a scene to be rendered and that can be used(and traversed) to determine geometry for a scene to be rendered thatmay be intersected by a ray being projected into the scene.

The ray tracing acceleration data structure in an embodiment represents(a plurality of) respective volumes within the scene being rendered andindicates and/or can be used to determine geometry for the scene to berendered that is present in those volumes.

The ray tracing acceleration data structure(s) can take any suitable anddesired form. In an embodiment the ray tracing acceleration datastructure(s) comprise a tree structure, such as a bounding volumehierarchy (BVH) tree. The bounding volumes may be axis aligned (cuboid)volumes. Thus, in one embodiment, the ray tracing acceleration datastructure comprises a bounding volume hierarchy, and in an embodiment aBVH tree.

The BVH is a tree structure with primitives (which may be triangles, orother suitable geometric objects) at the leaf nodes. The primitives atthe leaf nodes are wrapped in bounding volumes. In an embodiment thebounding volumes are axis aligned bounding boxes. The bounding volumesare then recursively clustered and wrapped in bounding volumes until asingle root node is reached. At each level of the recursion two or morebounding volumes may be clustered into a single parent bounding volume.For instance, and in an embodiment, each non-leaf node has acorresponding plurality of child nodes.

In an embodiment the ray tracing acceleration data structure used in thetechnology described herein comprises a ‘wide’ tree structure, in whicheach parent node may be (and in an embodiment is) associated withgreater than two child nodes, such as three, four, five, six, or more,child nodes. In an embodiments each parent node may be associated withup to six child nodes. In that case, each instance of ray-volumeintersection testing in an embodiment comprises testing one or more raysin the group of plural rays against each of the plural child nodes.

However, other suitable ray tracing acceleration data structures mayalso be used, as desired. For instance, rather than using a BVHhierarchy, where the scene is subdivided by volume on a per-objectbasis, e.g. by drawing suitable bounding volumes around subsets ofgeometry, e.g., and in an embodiment, such that each leaf node (volume)corresponds to a certain number of objects (primitives), the scene couldinstead be subdivided on a per-volume basis, e.g. into substantiallyequally sized sub-volumes. For example, the ray tracing accelerationdata structure may comprise a k-d tree structure, a voxel (gridhierarchy), etc., as desired. It would also be possible to use ‘hybrid’ray tracing acceleration data structures where the scene is subdividedin part on a per-object basis and in part on a per-volume basis. Variousother arrangements would be possible and the technology described hereinmay in general be used with any suitable ray tracing acceleration datastructure.

The ray tracing acceleration data structure that is traversed can begenerated and provided in any suitable and desired manner. For example,it may be previously determined and provided, e.g., as part of thedefinition of the scene to be rendered by the application that requiresthe graphics processing.

In an embodiment, the ray tracing acceleration data structure isgenerated by the graphics processor itself, e.g. based on an indicationof geometry for the scene that is provided to the graphics processor,e.g. in a preliminary processing pass before the scene is rendered.

It could also or instead be generated by a CPU (e.g. host processor),e.g. based on an indication of geometry for the scene, e.g. in apreliminary processing pass before the scene is rendered.

Other arrangements would, of course, be possible. The ray tracingacceleration data structure can represent and be indicative of thedistribution of geometry for a scene to be rendered in any suitable anddesired manner. Thus it may represent the geometry in terms ofindividual graphics primitives, or sets of graphics primitives, e.g.such that each leaf node of the tree structure represents acorresponding subset of the graphics primitives defined for the scenethat occupies the volume that the leaf node corresponds to. Additionallyor alternatively, the ray tracing acceleration data structure couldrepresent the geometry for the scene in the form of higher levelrepresentations (descriptions) of the geometry, for example in terms ofmodels or objects comprising plural primitives.

It would also be possible for a given ray tracing acceleration datastructure to represent the geometry in terms of indicating further raytracing acceleration data structures that need to be analysed. In thiscase, an initial ray tracing acceleration data structure would, forexample, represent further, e.g. finer resolution, ray tracingacceleration data structures that need to be considered for differentvolumes of the scene, with the traversal of the initial ray tracingacceleration data structure then determining a further ray tracingacceleration data structure or structures that need to be traverseddepending upon which volumes for the scene the ray in questionintersects.

Thus the ray tracing traversal operation could include transitionsbetween different ray tracing acceleration data structures, such astransitions between different levels of detail (LOD), and/or betweendifferent levels of multi-level ray tracing acceleration datastructures.

There may also be ray transformations between ray tracing accelerationdata structure switches (e.g. such that there is an automatic transitionbetween different ray tracing acceleration data structures with and/orusing a transformation of the ray, e.g. described by metadata of orassociated with the ray tracing acceleration data structure). Forexample, a transition between different levels of detail could use anidentity transform, and transitions between multi-level ray tracingacceleration data structures could use generic affine transformations ofthe rays.

Other arrangements would, of course, be possible. The traversaloperation can traverse the ray tracing acceleration data structure(s)for a ray in any suitable and desired manner, e.g., and in an embodimentin dependence upon the form of the ray tracing acceleration datastructure that is being traversed. The traversal operation will use theinformation provided about the ray to traverse the ray tracingacceleration data structure to determine geometry for the scene to berendered that may be intersected by the ray in question.

Thus, the traversal process in an embodiment operates to traverse theray tracing acceleration data structure to determine for each volume ofthe scene that the ray passes through in turn, whether there is anygeometry in the volume (indicated by the ray tracing acceleration datastructure). Thus, the ray tracing acceleration data structure will betraversed based on the position and direction of the ray, to determinewhether there is any geometry in the volumes of the scene along the pathof the ray (which could, accordingly, then potentially be intersected bythe ray). Other arrangements would, of course, be possible.

In particular, the traversal process involves, for a ray (in the groupof plural rays for which the traversal is being performed) that is beingused for the ray tracing process, testing the ray for intersection withone or more (child node) volumes associated with a node of the raytracing acceleration data structure to determine which of the associatedvolumes (i.e. child nodes) is intersected by the ray. The traversalprocess then comprises subsequently testing the ray for intersectionwith the volumes associated with the (child) node in the next level ofthe ray tracing acceleration data structure, and so on, down to thelowest level (leaf) nodes. Once the traversal process has worked throughthe ray tracing acceleration data structure, by performing the requiredray-volume intersection testing for the nodes to determine which volumes(represented by end/leaf nodes) contain geometry that may be intersectedby the ray, the ray can then be further tested to determine the actual(ray-primitive) intersections with the geometry defined within thosevolumes (and only within those volumes) (with any intersected geometrythen being shaded appropriately).

Subject to the requirements of the technology described herein thetraversal can be performed in any suitable fashion, as desired.

In an embodiment, the traversal operation traverses the ray tracingacceleration data structure for the path of the ray until a first(potential) intersection with geometry defined for the scene is foundfor the ray. However, it would also be possible to continue traversal ofthe ray tracing acceleration data structure after a first (potential)intersection has been found for a ray, if desired.

For example, the ray traversal operation could be (and in an embodimentis) configured and able to discard (ignore) a (potential) intersectionand to carry on with the traversal, e.g. depending upon the propertiesof the geometry for the intersection in question. For example, if a(potentially) intersected geometry is fully or partially transparent, itmay be desirable to continue with the traversal (and either discard orretain the initial “transparent” intersection).

Other arrangements would, of course, be possible. The ray tracingacceleration data structure traversal for a ray could comprisetraversing a single ray tracing acceleration data structure for the ray,or traversing plural ray tracing acceleration data structures for theray. Thus, in an embodiment the ray tracing acceleration data structuretraversal operation for a ray comprises traversing plural ray tracingacceleration data structures for the ray, to thereby determine geometryfor the scene to be rendered that may be intersected by the ray.

Plural ray tracing acceleration data structures may be traversed for aray e.g. in the case where the overall volume of, and/or geometry for,the scene is represented by plural different ray tracing accelerationdata structures.

Similarly, as discussed above, in one embodiment, a ray tracingacceleration data structure that indicates further ray tracingacceleration data structures to be traversed is used. In this casetherefore the ray tracing acceleration data structure traversal circuitwill operate to first traverse an initial ray tracing acceleration datastructure for the ray to determine one or more further ray tracingacceleration data structures to be traversed for the ray, and to thentraverse those determined one or more ray tracing acceleration datastructures for the ray, and so on, until an “end” ray tracingacceleration data structure or structures that provides an indication ofgeometry for the scene to be rendered is traversed for the ray.

According to the technology described herein the traversal of the raytracing acceleration data structure is performed by the programmableexecution unit executing a suitable shader program for the ray tracingoperation. In particular, and as explained above, in the technologydescribed herein, the traversal operation is performed for a wholegroup's worth of rays together that are being processed by acorresponding group of execution threads executing the program.

However, rather than the program performing the entire traversaloperation, in the technology described herein, an intersection testingcircuit is in an embodiment provided that performs the actualintersection testing between the rays and the volumes represented by thenodes of the ray tracing acceleration data structure during thetraversal. The ray-volume testing instruction when executed by theexecution threads in the execution thread group that is performing thetraversal for the group of plural rays, thus in an embodiment causes thegraphics processor to message an intersection testing circuit to causethe intersection testing circuit to perform the required testing of oneor more of the rays in the group of plural rays.

That is, in embodiments, the overall ray tracing operation is performedby a programmable execution unit of the graphics processor executing agraphics processing program to perform the ray tracing operation.However, in embodiments, when the program requires a ray to be testedagainst a node of the acceleration data structure, as part of the raytracing operation, the ‘ray-volume testing’ instruction(s) can beincluded into the program appropriately, such that when the set ofinstructions is executed, the execution unit is caused to message theintersection testing circuit and trigger the intersection testingcircuit to perform the required intersection testing between the raysand the volumes represented by the nodes of the acceleration datastructure.

In this respect, the technology described herein recognises that as partof the ray tracing operation described above there is still a need toperform many intersection tests between rays and the nodes of theacceleration data structure. The technology described herein thusrecognises that it may be beneficial to provide a dedicated intersectiontesting circuit for this purpose that can be called using an appropriateset of one or more ‘ray-volume testing’ instructions that can beincluded into the program that is being executed by the graphicsprocessor.

In other words, rather than, e.g., the programmable execution unitperforming the full ray tracing ray intersection determinationoperation, including traversing an acceleration data structure todetermine geometry that could be intersected by a ray and thendetermining whether any geometry is actually intersected by the ray, theprogrammable execution unit offloads some of that processing, and inparticular (and at least) the intersection testing between the rays andthe volumes represented by the nodes of the ray tracing accelerationdata structure to the intersection testing circuit.

This then has the effect of performing some of the ray tracing operation(namely the ray-volume intersection testing operations) using a circuit(hardware) that is dedicated for that purpose (rather than, e.g.,performing that operation using more general programmable processingcircuitry that is programmed to perform the required operation). Thiscan then lead to accelerated and more efficient intersection testing, ascompared, for example, to arrangements in which that is done byexecuting appropriate programs using a programmable processing circuit(which may be relatively inefficient, e.g. due to poor memory accesslocality for execution threads corresponding to different rays).

The use of a dedicated instruction that can be included into the programaccording to the technology described herein may thus facilitate the useof such intersection testing circuit (hardware). For instance, asexplained above, the instruction can be suitably incorporated into theshader program to cause the graphics processor to message theintersection testing circuit as required to perform ray-volumeintersection testing for multiple rays in the group of plural rays inone testing instance. Likewise, grouping rays together for the traversaloperation in the manner of the technology described herein means thatthe intersection testing circuit can load all of the relevant input datafor the multiple rays to be tested in one go, thus saving memorybandwidth, as explained above.

The technology described herein therefore particularly facilitates theuse of a dedicated circuit (hardware) in this way, to provide anoverall, improved (more efficient) traversal operation.

The intersection testing circuit of the graphics processor should be,and is in an embodiment, a (substantially) fixed-function hardware unit(circuit) that is configured to perform the intersection testingaccording to the technology described herein. The intersection testingcircuit should thus comprise an appropriate fixed function circuit orcircuits to perform the required operations, although it may compriseand have some limited form of configurability, in use, e.g. if desired.

There may be a single or plural intersection testing circuits, e.g. suchthat plural programmable execution units share a given (or a single)intersection testing circuit, and/or such that a given programmableexecution unit has access to and can communicate with and use pluraldifferent intersection testing circuits. Where there are pluralintersection testing circuits, each such circuit can in an embodimentoperate in the manner of the technology described herein.

The intersection testing circuit (or circuits) should also, and in anembodiment does, have a suitable messaging interface for communicatingwith the programmable execution unit of the graphics processor asrequired.

Thus, in the technology described herein, during the ray tracingoperation, when the traversal operation requires a ray-volume testingintersection operation to be performed for one or more rays in the groupof plural rays that are performing the traversal together, theprogrammable execution unit in an embodiment triggers an intersectiontesting circuit to perform the desired (ray-volume) intersection testingfor the rays in question.

As well as the intersection testing circuit, there may also be otheraccelerators (special purpose units) that are able to communicate withthe programmable execution unit, such as a load/store unit (circuit), anarithmetic unit or units (circuit(s)), a texture mapper, etc., ifdesired.

The communication between the intersection testing circuit(s), etc., andthe programmable execution unit can be facilitated as desired. There isin an embodiment an appropriate communication (messaging) network forpassing messages between the various units. This communication(messaging) network can operate according to any desired communicationsprotocol and standard, such as using a suitable interconnect/messagingprotocol.

When the programmable execution unit requires the intersection testingcircuit to perform intersection testing against a given (non-leaf) nodeof the ray tracing acceleration data structure for one or more rays inthe group of plural rays for which the traversal operation is beingperformed, the programmable execution unit in an embodiment thereforesends a message to that effect to the intersection testing circuit.

The message that is sent from the programmable execution unit to theintersection testing circuit should, and in an embodiment does, containinformation that is required to perform the relevant intersectiontesting operation. Thus it in an embodiment indicates one or more of,and in an embodiment all of the inputs for the ray-volume intersectiontesting, e.g. as described below.

At least in the case where the graphics processor includes pluralprogrammable execution units, the message in an embodiment alsoindicates the sender of the message (i.e. which programmable executionunit has sent the message), so that the result of the ray-volumeintersection testing can be returned to the correct programmableexecution unit.

The intersection testing circuit can thus be called to performintersection testing as desired by the appropriate ray-volumeintersection testing instruction (or set of instructions) being includedinto the program. Thus, in embodiments, when a program is generated forcausing a group of plural rays to perform a traversal of the ray tracingacceleration data structure together, in the manner described above,when the traversal requires ray-volume intersection testing to beperformed in respect of a node of the ray tracing acceleration datastructure, an appropriate set of one or more ray-volume intersectiontesting instructions can be included into the program that when executedwill cause the programmable execution unit to message the intersectiontesting circuit to cause the intersection testing circuit to perform therequired ray-volume intersection testing for the node (and to return theoutput to the programmable execution unit).

However other arrangements would be possible. For instance, rather thanmessaging hardware to perform the intersection testing, the instructioncould cause shader program to jump to a suitable sub-routine toimplement the required ray-volume intersection testing.

The actual intersection testing itself, however this is implemented, canbe performed in any suitable fashion, as desired, e.g. in the normalfashion for ray tracing processes.

For instance, in the technology described herein, the inputs that areprovided for the ray-volume intersection testing (e.g., and in anembodiment, to the intersection testing circuit, where this is present)may and in an embodiment comprise:

a set of one or more rays from the group of plural rays that areperforming the traversal (each ray corresponding to a respectiveexecution thread in the execution thread group); and

a node of the acceleration data structure that is to be tested for thegroup of plural rays that are performing the traversal.

Each ray may be, and in an embodiment is, defined in terms of the origin(originating position (e.g. x, y, z coordinates)) for the ray that is tobe tested (for which the traversal of the ray tracing acceleration datastructure is to be determined); the direction of (a direction vectorfor) the ray that is to traverse the ray tracing acceleration datastructure; and the range (distance) that the ray is to traverse (the(minimum and/or maximum) distance the ray is to traverse into thescene).

The set of one or more rays that are input for testing may in someembodiments be the set of all of the rays in the group of plural rays.That is, in some cases, the whole group of rays is input for testing,and then tested accordingly. However, in other embodiments the set ofone or more rays that is input for testing comprises a subset of raysfrom within the whole group of rays, the subset comprising a subset ofrays that are to be tested for the node in question (e.g. since theyhave been found to potentially intersect one or more volumes associatedwith the node, e.g. in a previous testing instance). This can beindicated appropriately, e.g. using a suitable bit ‘mask’ thatidentifies which rays in the group of plural rays should be testedagainst the node in question (this will be explained further below).

The node is associated with a set of one or more volumes within thescene. For instance, and in embodiments, each (non-leaf, or ‘internal’)node of the ray tracing acceleration data structure may be, and inembodiments is, associated with a set of plural child node volumes. Theassociated volumes are thus also obtained as input for the ray-volumeintersection testing. These may be obtained in any suitable fashion. Forinstance, the node volumes may be stored, e.g. in memory, such that theycan be loaded in as required when a ray-volume intersection testinginstruction is executed in respect of a given node.

The node volume data for a (non-leaf/internal) node (the child nodevolumes associated with the node) may be stored, e.g. in memory, in anysuitable and desired manner.

For example, each child node volume generally corresponds to athree-dimensional cuboid (e.g. cube) within the scene. The position (andsize) of the child node volume can thus be defined with reference to itseight vertices. A (each) vertex may be defined in terms of a set of x,y, z co-ordinates defining the position of the vertex (either in termsof its ‘absolute’ position within the scene, or in some embodimentsrelative to another position, e.g. a position of a parent node vertex,as will be explained further below). Thus, a child node volume could bedefined in terms of a set of eight three-dimensional co-ordinates (so 24co-ordinate values in total).

In embodiments, however, rather than storing co-ordinates for each ofthe (eight) vertices that define the volume, the child node volumes areinstead defined in terms of a suitable set, e.g. pair, of reducedvertices such as, for example, the bottom left (‘minimum’) and top right(‘maximum’) vertices of the volume, and it is only the co-ordinatevalues for these (two) vertices that are stored for a respective childnode. It will be appreciated that storing the opposite corners is enoughinformation to define the entire volume, as this will include themaximum and minimum values along each of the x, y, z axes. Thus, inembodiments, each child node volume is defined in terms of a suitablepair of vertices, with a corresponding set of three-dimensional (x, y,z) co-ordinates stored for each of these vertices (and only these twovertices). This provides a benefit compared to storing the full set ofvertices for the volume.

In that case, the sets of co-ordinates defining the two vertices for achild node volume (i.e. the minimum and maximum values along each of thethree axes) could simply be stored in full, e.g. in 32-bit floatingpoint format. In that case, 4 bytes of data would be needed to storeeach co-ordinate, so the three co-ordinates for the two vertices wouldbe stored in total using 24 bytes of data.

In embodiments, however, the child node volume data is stored in anencoded (e.g. compressed) manner, e.g., and in an embodiment, tofacilitate more efficient memory access. In other words, the technologydescribed herein recognises that there is scope for improvements in howsuch node data is stored.

For example, to maximise memory access efficiency, data from external(e.g. main) memory is typically (and in an embodiment) accessed in“bursts”, with the graphics processor in an embodiment being configuredto read in a certain amount of data (a “block”) in a single memorytransaction (burst). This memory access unit size (block size) is in anembodiment common to all elements of the graphics processor. Forexample, where a cache system is used to transfer data from externalmemory to the graphics processor, the cache is in an embodimentarranged, and the cache lines are in an embodiment sized, in order tofacilitate fetching blocks of data in this manner. Thus, a single“block” of data may be stored in a set of one or more (integer) cachelines.

In embodiments, the child node volume data for a given(non-leaf/internal) node is thus in an embodiment stored in such amanner to facilitate more efficient memory access. For example, asmentioned above, each (non-leaf/internal) node of the ray tracingacceleration data structure is in an embodiment associated with, andstores respective volumes for, a plurality of (e.g., and in anembodiment, six) child nodes. In an embodiment, the child node data fora given (non-leaf/internal) node is stored in such a manner tofacilitate accessing more, e.g., and in an embodiment, all, of the childnode volumes associated with the node in a single memory transaction.

That is, in embodiments, all of the child node volume data for arespective node is stored in a single “block” of data having a sizecorresponding to the amount of data that can be accessed in a singlememory transaction. Thus, where a cache system is used, all of the childnode volume data for a respective node is in an embodiment stored in asingle cache line (or at least a set of cache lines that can be accessedin a single memory transaction).

In this respect, the technology described herein recognises that beingable to store more (e.g. all) of the child node volume data associatedwith a given node in a single block, such that the data can be obtainedin a single memory transaction, can improve the overall efficiency ofperforming the ray tracing traversal operation that uses the ray tracingacceleration data structure. For example, when a ray (or group of rays)is to be tested for intersection against a set of child node volumesassociated with a (non-leaf/internal) node, the graphics processor isable to obtain a plurality of (e.g. and in an embodiment all of) thechild node volumes associated with the node that is to be tested in asingle memory transaction. This therefore reduces the number of memorytransactions and speeds up the fetching of the required child node data(the child node volumes).

To facilitate compressing the child node volume data, in an embodiment,the vertices for the child node volumes are suitably encoded asdifferences relative to one of the vertices for the parent node volumeencompassing the child node volumes. Thus, rather than storing theco-ordinates for the child node volume vertices in a full format thatdefines the absolute position of the child node volume within the scene,the vertex co-ordinates are in an embodiment instead stored asdifferences relative to an ‘origin’ co-ordinate corresponding to one ofthe vertices defined for the parent node volume. In an embodiment, thechild node vertices are stored as differences relative to the bottomleft (minimum) vertex of the associated parent node volume. Otherarrangements would however in principle be possible.

Thus, in embodiments, each vertex co-ordinate for a child node volume isstored as a difference value relative to the origin co-ordinate. Thistherefore reduces the dynamic range of co-ordinates that needs to bestored since the child node volume co-ordinates can only vary within thedimensions of the parent node volume. This can therefore reduce theamount of data required to store each child node vertex since only needto signal a reduced co-ordinate range. For example, rather than storingthe vertices in a full, e.g. 32-bit floating point format, the valuesthat are stored in respect of each of the child node vertices can beheavily quantised. For instance, in embodiments, as will be explainedfurther below, an 8-bit integer value may be stored for each vertex.

To further reduce the amount of data required to store the child nodevolumes, rather than storing the difference values for the child nodevertices in full, in an embodiment a ‘base’ value is stored for each ofthe child node vertex co-ordinates that can be modified (e.g. scaled)using an appropriate modifier to determine the actual value. This, inembodiments, a set of modifier values are also stored in the same datastructure as the base values, which modifier values can be suitablyapplied to the respective co-ordinate base values for each of the childnode vertices and used to determine the child node volume co-ordinates(e.g. relative to the origin position) accordingly. Thus, in anembodiment, for each child node vertex that is being stored, theco-ordinates are encoded using a set of one or more modifier values thatare stored for the (non-leaf) node as a whole.

The modifier values may in general take any suitable form as desired.For example, in embodiments the modifier values may comprise offsetsthat are to be applied to the respective co-ordinate base values foreach of the child node vertices. In an embodiment, however, the modifiervalues comprise scaling factors that are to be applied to the ‘base’values stored for each of the child node vertex co-ordinates in order todetermine the actual co-ordinates defining the child node that are to beused for the ray tracing process. Thus, applying the modifier values inan embodiment comprises multiplying the base value for the child nodevertex co-ordinate by the appropriate modifier value (i.e. scalingfactor). In that respect, note that a respective modifier value (scalingfactor) may be, and in embodiments is, stored for each axis (such thatthere are three modifier values for the respective x, y, z axes).Various other arrangements would however be possible.

Thus, in embodiments, for each child node vertex co-ordinate that isbeing stored, a unique base co-ordinate value is stored. The unique baseco-ordinate value is determined such that the true co-ordinate value(e.g. relative to the origin position) can be determined by applying therespective modifier value to the base co-ordinate value. For instance,the modifier values can be (and in embodiments are) determined byappropriately dividing the parent node volume into child node volumessuch that there is a continuous set of child node volumes defined withinthe parent node volume. The step size for the child node volumeboundaries is thus effectively determined by the set of modifier values(which as mentioned above are determined based on the size of the parentnode volume to divide the parent node volume into suitable child nodevolumes). Once a set of suitable modifiers are determined, theco-ordinate values can then be encoded appropriately, e.g. by applyingthe inverse modification to the true value (e.g. by dividing thedifference by a scaling factor) to determine the base values that arestored and to which the modifiers should be applied to determine thetrue values when needed.

In this respect, it will be appreciated that the exact sizes of thechild node volumes are essentially arbitrary so long as they fully coverthe parent node volume to allow the ray tracing operation to beperformed. For example, it is always acceptable to test a ray (or groupof rays) against a larger child node volume than necessary whendetermining whether or not the child node volume may contain geometry,since this will be resolved during the final ray-primitive intersectiontesting for the leaf node at the end of the traversal of that branch.

This encoding of the co-ordinate values in turn allows furtheropportunities for reducing the amount of data that is stored since thebase co-ordinate values can be heavily quantised, so long as the resultof the quantisation is performed conservatively such that the resultingchild node volumes become larger. For instance, rather than storing afull (e.g.) 32-bit floating point co-ordinate, the co-ordinated iseffectively converted to an offset from the parent co-ordinate, whichoffset is in turn stored as (e.g.) an 8-bit integer value that ismodified (e.g. scaled) by the modifier value to determine the fullco-ordinate. Encoding the vertex co-ordinates in this way can thussignificantly reduce the amount of data required to store a child nodevolume.

Thus, in embodiments, the child node volume data stored for a (non-leaf)node comprises a set of one or more (e.g. three) modifier values and,for each child node for which a child node volume is being stored, aunique set of base co-ordinate values. In order to determine the childnode volumes, the appropriate modifier values are thus applied to theset of base co-ordinate values to give a suitable set of scaled childnode volume co-ordinates.

In an embodiment, the scaled child node volume co-ordinates defines theposition of the child node volume relative to an origin co-ordinatevalue, e.g., and in embodiments, corresponding to one of the parentvertices, for example the minimum (bottom left) vertex. In embodiments,therefore, the origin co-ordinate value is also stored as part of thesame data structure, e.g., and in an embodiment, that can be stored andaccessed as a single memory block.

The effect of all this is to further reduce the amount of data requiredto be stored for defining the child node volumes.

For example, a typical graphics processing system may be configured toaccess memory blocks having a size of 64 bytes. Thus, the memory systemis configured to access data in 64-byte blocks, and this is also inembodiments the size of a single cache line, and so on.

Considering an example of a child node volume that is expressed usingtwo vertices, but with each of the vertices' co-ordinates defined in a32-bit floating point format, in that case, each vertex requires 12bytes of data to be stored, and each child node volume thereforerequires 24 bytes of data. Thus, a typical 64-byte cache line would onlybe able to store two child node volumes. In cases where the node may beassociated with up to six child nodes, that means that up to threeseparate memory transactions would be required to fetch all of the childnode data.

On the other hand, by storing the child node data in an encoded form,e.g. in the manner of the embodiments described above, it is possible toreduce the amount of data that is required to be stored for each vertex,and thereby reduce the overall amount of data stored for the node. Forexample, the base co-ordinate values for each vertex can in anembodiment be stored using (only) 3 bytes of data (e.g. 8 bits for eachof the x, y, z base co-ordinate values). Thus, 6 bytes of data are usedfor storing the base co-ordinate values for the two vertices definingthe child node volume. The three modifier values can in an embodimenteach be stored as 8-bit unsigned values, but these modifier values areonly stored once (and applied to all of the child nodes for which datais being stored). Thus, storing the modifier values adds only another 3bytes of data.

In this case, it is therefore possible to store the vertices fordefining up to six child node volumes using 39 bytes of data. The nodevertex data can therefore easily fit within a 64 byte cache line(together with any other child node metadata, e.g. an indication of thechild node type, that may also need to be stored with the child nodevolumes). This is therefore a significant improvement compared to otherpossible approaches, e.g. compared to storing the child node volumes infull.

It will be appreciated that the numbers given above are only presentedby way of example and other arrangements would be possible, e.g.depending on the memory access size and the desired level of precision.Overall, however, it will be appreciated that the embodiments describedabove facilitate a more efficient storage of the child node volume data.

Thus, in embodiments, obtaining the child volumes associated with aparent node for input to the ray-volume intersection testing comprises:obtaining a set of node volume data indicative of respective volumes ofchild nodes associated with the parent node, the node volume datacomprising, for each child node for which volume data is stored, arespective set of base co-ordinate values, and the node volume datafurther comprising a set of one or more modifier values (e.g. scalingfactors) that are to be applied to the respective base co-ordinatevalues for the child nodes in order to determine an associated volumefor the child node.

When the child node volumes are subsequently required, e.g. as part ofthe ray-volume intersection testing described above, the method in anembodiment further comprises: for each child node, applying the set ofone or more modifier values to the respective base co-ordinate valuesfor the child node to determine a set of modified_co-ordinate valuesusable to determine an associated volume for the child node.

It is believed that storing such node data in this way may be novel andinventive in its own right.

Accordingly, a further embodiment of the technology described hereincomprises a method of accessing node volume data for use by a graphicsprocessor when rendering a frame that represents a view of a scenecomprising one or more objects using a ray tracing process, wherein theray tracing process uses a ray tracing acceleration data structureindicative of the distribution of geometry for the scene to be renderedto determine geometry for the scene that may be intersected by a raybeing used for a ray tracing operation, the ray tracing accelerationdata structure comprising a plurality of nodes, each node associatedwith a respective one or more volumes within the scene, and wherein theplurality of nodes includes at least one parent (i.e. internal ornon-leaf) node that is associated with a respective set of plural childnodes, with the parent node volume encompassing the volumes of itsrespective child nodes, the ray tracing process comprising testing raysfor intersection with the volumes represented by the nodes of theacceleration data structure to determine geometry for the scene to berendered that may be intersected by the rays;

the method comprising:

obtaining for a parent node to be tested a set of node volume dataindicative of respective volumes of child nodes associated with theparent node, the node volume data comprising, for each child node forwhich volume data is stored, a respective set of base co-ordinatevalues, and the node volume data further comprising a set of one or moremodifier values that are to be applied to the respective baseco-ordinate values for the child nodes in order to determine anassociated volume for the child node.

A further embodiment of the technology described herein comprises agraphics processor that is operable to render a frame that represents aview of a scene comprising one or more objects using a ray tracingprocess, wherein the ray tracing process uses a ray tracing accelerationdata structure indicative of the distribution of geometry for the sceneto be rendered to determine geometry for the scene that may beintersected by a ray being used for a ray tracing operation, the raytracing acceleration data structure comprising a plurality of nodes,each node associated with a respective one or more volumes within thescene, and wherein the plurality of nodes includes at least one parent(i.e. internal or non-leaf) node that is associated with a respectiveset of plural child nodes, with the parent node volume encompassing thevolumes of its respective child nodes, the ray tracing processcomprising testing rays for intersection with the volumes represented bythe nodes of the acceleration data structure to determine geometry forthe scene to be rendered that may be intersected by the rays;

the graphics processor comprising:

a memory interface circuit that is configured to obtain, from external(e.g. main) memory, for a parent node to be tested a set of node volumedata indicative of respective volumes of child nodes associated with theparent node, the node volume data comprising, for each child node forwhich volume data is stored, a respective set of base co-ordinatevalues, and the node volume data further comprising a set of one or moremodifier values that are to be applied to the respective baseco-ordinate values for the child nodes in order to determine anassociated volume for the child node.

In an embodiment, the method further comprises: for each child node,applying the set of one or more modifier values to the respective baseco-ordinate values for the child node to determine a set ofmodified_co-ordinate values usable to determine an associated volume forthe child node. That is, once the node volume data is obtained, it is inan embodiment then used to determine the associated volume for the childnode (which volume can be (and is) in turn used for the ray-volumeintersection testing).

As mentioned above, in embodiments, the node volume data furthercomprises an origin co-ordinate relative to which the child node volumeis defined. For example, the origin co-ordinate may be a co-ordinate ofa vertex the parent node volume, e.g., and in an embodiment, theco-ordinate of the minimum (bottom left) vertex of the parent nodevolume.

In that case, the associated volume for a child node is determined byadding the set of modified_co-ordinate values generated by applying themodifier values to the base co-ordinate values for the child node to theorigin co-ordinate to define the child node volumes within the scene.

Various other arrangements would of course be possible. In embodiments,the node volume data may also comprise any other information, e.g. childnode metadata, that may be required to be stored for the ray tracingprocess. For example, this may include at least indications of childnode ‘type’, i.e. whether the child node is an ‘internal’ (non-leaf)node or whether the child node is, for example, an end (leaf) nodecontaining primitives, or a node that is not used.

As mentioned above, a given parent (non-leaf/internal) node may in anembodiment be associated with up to six respective child nodes. Inembodiments, the node volume data stores encoded volume data for each ofthe (six) child nodes associated with the parent node.

If desired, padding may be performed to align the node volume data tothe size of a memory access block.

As will be appreciated by those skilled in the art, these additionalembodiments of the technology described herein relating to the storingof the node data can, and in an embodiment does, include any one or moreor all of the features of the other aspects technology described hereindescribed herein, as appropriate.

Thus, for example, once the co-ordinates for a child node volume hasbeen determined, the child node volume can be provided as input to theray-volume intersection testing accordingly, and used to determinewhether or not that child node potentially contains geometry, and inturn to control how the traversal operation proceeds. The benefit ofthese additional embodiments is therefore that the child node volumedata that needs to be tested against a given ray (or group of rays) canbe accessed more efficiently, using fewer memory transactions, which canin turn speed up the overall traversal operation. The ray-volumeintersection testing itself can then be performed by testing each of therays in the input set of one or more rays against the one or morevolumes associated with the node. This can be done in any suitablemanner, as desired, e.g. in the normal way for ray tracing operations.

In embodiments a plurality of rays are tested against a plurality ofchild volumes associated with a node. This is in an embodiment done inan iterative manner, e.g. by testing a first ray in the group againsteach of the child volumes in turn, and then moving on to the second ray,and so on, until all of the rays in the group that need to be testedhave been tested against all of the child volumes. Various arrangementswould be possible in this regard.

In some embodiments the node volumes may be stored, e.g. in memory, in acompressed format. In that case, it may be beneficial to perform theintersection testing at a corresponding, lower resolution. In thisrespect it will be appreciated that there is no harm in performing theray-volume testing more conservatively (at a lower resolution) duringthe traversal operation, as the actual ray-primitive intersections willbe determined in a subsequent step, e.g. once the traversal is completeand it has been determined which geometry may be intersected.

Once the required intersection testing in respect of a node hascompleted, the result of the ray-volume intersection testing for thenode is then returned as output. The outputted result can then be usedby the program that is performing the traversal, e.g. to determine whichnodes of the ray tracing acceleration data structure to test next. Forexample, if it is determined that a particular set of child nodes of thenode being tested are potentially intersected by a ray, the outputtedresult should indicate that set of child nodes, to cause the traversalprogram to trigger ray-volume intersection testing with those nodes, andso on.

A suitable traversal record is thus in an embodiment maintained to trackand manage which nodes should be tested during the traversal operation.The traversal record thus in an embodiment includes as entriesindications of which nodes of the ray tracing acceleration datastructure should be tested (i.e. which nodes have volumes for which ithas been determined that are intersected by a ray in the group of pluralrays performing the traversal operation).

The traversal record may generally take any suitable form, e.g. as maysuitable be used for managing such ray tracing traversal operations, butin an embodiment comprises a traversal ‘stack’, as mentioned above.

Thus, during the traversal operation, when (and whenever) it isdetermined by an instance of ray-volume testing that a (child) noderepresents a subset of geometry that may be intersected by a ray in thegroup of plural rays performing the traversal operation, an indicationof, e.g. pointer to, the node is then included into (e.g. pushed to) thetraversal record so that the entry can subsequently be read out (popped)from the traversal record by the shader program to cause the rays to betested against that node, accordingly, and so on.

The traversal record can then be worked through with the record entriesbeing read out (popped) accordingly and provided to the shader programto determine which nodes to be next tested. In the case of a traversalstack, this is in an embodiment managed using a last-in-first-out′scheme with the node intersections being pushed to/popped from the stackappropriately. However, various arrangements would be possible in thatrespect.

Thus, the ray-volume intersection testing in an embodiment determineswhich (child) nodes are potentially intersected by a ray, and for eachchild node that is determined to be intersected by a ray, a pointer tothat child node is pushed to the traversal record. The record is thenworked through as the traversal program is executed to cause those nodesto be tested. The output of the ray-volume testing is thus in anembodiment returned in the form of an updated state of the traversalrecord, e.g. indicating which (child) nodes are hit and need to betested at the next level, and so on.

The result of the ray-volume intersection testing thus in an embodimentcomprises a set of one or more nodes (e.g. child nodes of the node beingtested) that were determined from the ray-volume intersection testing asbeing intersected by at least one of the rays that were tested. In thatcase, the current state of the traversal record (which is in anembodiment shared between all execution threads in the execution threadgroup) is in an embodiment also provided as input for the ray-volumeintersection testing, together with the information identifying the nodeand rays to be tested. The updated state of the traversal record is inan embodiment then being returned as output.

The traversal record is thus a list of which nodes are intersected, andtherefore contain geometry that may be intersected, and the traversaloperation is performed by working through the stack entries until thepotential intersections with the volumes of the lower level nodes havebeen determined.

In the technology described herein the traversal operation is performedfor a group of plural rays together. This means that the traversalrecord can be and in an embodiment is managed for the group of pluralrays as a whole (in an embodiment using a set of shared, commonregisters for the corresponding plurality of execution threadsprocessing the rays in the group of rays). This means that there is noneed to load/store individual traversal records for each ray.

Because the traversal record is in an embodiment shared by a pluralityof rays, it is in an embodiment also tracked which rays in the group ofplural rays need to be tested for which nodes.

For instance, the ray-volume intersection testing could simply beperformed for all of the rays in the group for each node that is to betested (in that case whenever any of the rays in the group of pluralrays are found to intersect the volume associated with a particular(child) node, all of the rays in the group of plural rays may then betested against that node, regardless of which of the rays were actuallydetermined to intersect the volume). However, this may be relativelyinefficient.

Thus, in embodiments, the result of the intersection testing comprisesan indication of a node that needs to subsequently be tested (e.g.because it's associated volume was intersected by the rays that weretested, as described above), together with an indication of which of therays that were tested were determined to intersect the volume associatedwith the node, and should therefore be tested against the node when thetraversal program reaches that node.

This indication is in an embodiment provided in the form of a bit ‘mask’indicating which rays should be tested, and which bit mask is in anembodiment also pushed to the traversal stack as a result of theintersection testing. That is, the ray-volume intersection testing in anembodiment determines which rays intersect which (child) volumesassociated with the node being tested, and an indication of which rayshave been found to intersect which volumes, e.g. in the form of the bitmask associated with each node that is determined as being intersectedby a ray, is returned as a result of the intersection testing.

Thus, when a node is popped from the traversal record (e.g. stack) forray-volume intersection testing, before performing the ray-volumeintersection testing, the active mask can be (and in an embodiment is)read to determine which of the rays in the group of plural rays need tobe tested against that node, and in an embodiment only those rays aretested.

Thus, when loading the instruction for testing against a given node, itis in an embodiment first determined using the bit mask which rays inthe group of rays performing the traversal operation together should betested against the node. The desired ray-volume intersection testing canthen be performed for the node for those rays (and only those rays), andso on, to determine which geometry is potentially intersected by whichrays.

In an embodiment, the traversal record is managed using a set of commonregisters shared by the execution threads in the group of pluralexecution threads that are processing the group of plural raysperforming the traversal operation. The record is thus the currenttraversal record loaded into all lanes of one general purpose register.The registers may be configured in any suitable fashion, as desired.

For instance, when executing an instruction in a program, the executionunit (e.g. the appropriate functional unit, such as an arithmetic unit,of the execution unit) will typically read one or more input data values(operands), perform a processing operation using those input data valuesto generate an output data value, and then return the output data value,e.g. for further processing by subsequent instructions in the programbeing executed and/or for output (for use otherwise than duringexecution of the program being executed).

The input data values to be used when executing the instruction willtypically be stored “locally” in an appropriate set of registers (aregister file) of and/or accessible to the execution (functional) unit,and the output data value(s) generated by the execution (functional)unit when executing the instruction will correspondingly be written backto that storage (register file).

To facilitate this operation, each execution thread, when executing ashader program, will correspondingly be allocated a set of one or moreregisters for use by that thread when executing the shader program.

Thus when executing an instruction, an execution thread will read inputdata values (operands) from a register or registers of a set of one ormore registers allocated to that thread, and write its output value(s)back to a register or registers of the thread's register allocation.

The data will be loaded into the registers, and written out from theregisters, from and to an appropriate memory system of or accessible tothe graphics processor (e.g. via an appropriate cache system (cachehierarchy)).

Thus, as well as the programmable execution unit, the graphics processorincludes a group of plural registers (a register file) operable to andto be used to store data for execution threads that are executing. Eachthread of a group of one or more execution threads that are executing ashader program will have an associated set of registers to be used forstoring data for the execution thread (either input data to be processedfor the execution thread or output data generated by the executionthread) allocated to it from the overall group of registers (registerfile) that is available to the programmable execution unit (and toexecution threads that the programmable execution unit is executing).

That is, the execution thread group as a whole is in an embodiment alsoallocated one or more shared, e.g. general purpose, registers, and it isthese common registers that are in an embodiment used to manage thetraversal record in the technology described herein.

Where there are plural execution units, each execution unit may have itsown distinct group of registers (register file). There may also be (andin an embodiment is) a single group of registers (register file) sharedbetween plural (e.g. in an embodiment all) of the separate executionunits.

Thus, the result of the ray-volume intersection testing is in anembodiment pushed to the traversal record that is stored using theshared registers. Likewise the traversal record entries are in anembodiment popped from the shared registers. The required push/popoperations for managing the traversal record are in an embodimentimplemented, e.g., by performing suitable register shifts.

The group(s) of registers (register file(s)) can take any suitable anddesired form and be arranged in any suitable and desired manner, e.g.,as comprising single or plural banks, etc.

The graphics processor will correspondingly comprise appropriateload/store units and communication paths for transferring data betweenthe registers/register file and a memory system of or accessible to thegraphics processor (e.g., and in an embodiment, via an appropriate cachehierarchy).

Thus the graphics processor in an embodiment has an appropriateinterface to, and communication with memory (a memory system) of oraccessible to the graphics processor.

The memory and memory system is in an embodiment a main memory of oravailable to the graphics processor, such as a memory that is dedicatedto the graphics processor, or a main memory of a data processing systemthat the graphics processor is part of. In an embodiment, the memorysystem includes an appropriate cache hierarchy intermediate the mainmemory of the memory system and the programmable execution unit(s) ofthe graphics processor.

The traversal program will thus traverse the nodes of the ray tracingacceleration data structure, performing the required ray-volumeintersection testing for the nodes at each level of the ray tracingacceleration data structure accordingly to determine with reference tothe end (leaf) nodes of the ray tracing acceleration data structurewhich geometry (if any) may be intersected by the rays in the group ofplural rays for which the traversal operation is being performed.

Thus, the ray-volume intersection testing described above is in anembodiment performed, as required, in respect of plural nodes acrossmultiple levels of the ray tracing acceleration data structure, e.g.down to the level of the end (leaf) nodes representing the subsets ofgeometry defined for the scene.

The end result of this is thus an indication of which geometry (if any)may be intersected by the rays in the group of plural rays. In anembodiment this indication also indicates which of the rays potentiallyintersect which subsets of geometry (e.g. in the form of a bit mask, asexplained above).

In the case that the ray was found to intersect a volume of the scenethat contains geometry defined for the scene (thus the traversaloperation found that there is geometry defined for the scene that theray potentially intersects), the result (the indication of geometry forthe scene to be rendered that may be intersected by the ray) that isreturned to the programmable execution unit should, and in an embodimentdoes, comprise an indication of the defined geometry in the volume orvolumes determined to be intersected by the ray. Thus, in the case wherea ray is found to intersect a volume that contains defined geometry,then the ray tracing acceleration data structure traversal operationshould, and in an embodiment does, return to the programmable executionunit an indication of the geometry for the volume in question.

In this case, the indication of geometry for the scene to be renderedthat may be intersected by the ray in question can indicate the geometrythat could be intersected for the ray in any suitable and desiredmanner, e.g., and in an embodiment, in dependence upon the format of theray tracing acceleration data structure that has been traversed. Thus,this could be in the form of a set of one or more primitives (e.g.points, lines or polygons, such as triangles, etc., and/or spheres,cylinders, cones, etc.) that could be intersected by the ray, and/orsome form of higher level definition and/or description of geometry thatcould be intersected by the ray, for example in the form of more generalor generic references to geometry, such as higher order representationsof geometry for the scene.

The information that is provided for the (potentially) intersectedgeometry can take any suitable and desired form, e.g., and in anembodiment, in dependence upon the form of the geometry itself. Forexample, in the case of a set of primitives (as candidates forintersection), the appropriate primitive identifiers and any associatedgeometry identifier (e.g. to which they belong) could be returned.

The Applicants have recognized that it would also be possible for thetraversal for a ray to fail to find any geometry defined for the scenethat the ray could potentially intersect, e.g. in the case when none ofthe volume of the scene that the ray passes through contains any definedgeometry for the scene.

In the case that the ray tracing acceleration data structure traversalfinds that the ray does not traverse any volume that contains definedgeometry for the scene, then the graphics processor in an embodimentreturns an appropriate response in that event. In an embodiment, the raytracing acceleration data structure traversal returns a responseindicating that nothing has been intersected by the ray (that nopotential intersection has been found) (i.e. that there has been a“miss”).

In an embodiment, in response to such a “miss” response from the raytracing acceleration data structure traversal, the programmableexecution unit performs an appropriate particular, in an embodimentselected, in an embodiment predefined, “default” operation for furtherprocessing for the sampling position in question in response to thatevent. This could comprise, for example, assuming intersection with abounding volume or ‘skybox’ or computing a procedural colour for thebackground, etc.. Various other arrangements would be possible in thisregard. The programmable execution unit will then shade the samplingposition accordingly.

Thus, in an embodiment, the ray tracing operation acts to (and isconfigured to) determine whether any of the volumes in the scenerepresented by the ray tracing acceleration data structure traversed bythe ray contain any geometry for the scene, and in the case where theray does traverse a volume for the scene that contains geometry definedfor the scene, returns to the programmable execution unit an indicationof the geometry for the volume in question, but where the ray does nottraverse any volume that contains geometry defined for the scene,returns to the programmable execution unit an indication of that (a“miss” event).

Thus, in an embodiment, once it has been determined which subsets ofgeometry may be intersected by a ray (from within a group of pluralrays), it is then determined which, if any, of the geometry is actuallyintersected by the ray(s), e.g. by performing suitable ray-primitiveintersection testing, as described above.

The ray-primitive intersection determination can use the informationreturned by the ray tracing acceleration data structure traversal asappropriate and desired. Thus it will, in an embodiment, use theindication of geometry that may be intersected by the ray to testwhether the geometry is actually intersected by the ray, together withany other properties, such as surface properties, indicated for thegeometry that may affect intersection of the ray or the operation thatis required.

Thus, for each leaf node (subset of geometry) that may be intersected bya ray or rays, it is then determined which geometry is actuallyintersected by the rays. That is, when the traversal operation todetermine which geometry may be intersected by the rays in the group ofplural rays determines that a particular subset of geometry (representedby leaf node) may be intersected by a ray or rays, the graphicsprocessor then tests the ray or rays for intersection with theindividual units of geometry (primitives) accordingly.

The geometry associated with a given leaf node may be obtained in anysuitable and desired manner. In an embodiment, it is obtained frommemory. In that case, the geometry (e.g. a set of primitives) associatedwith the leaf node may be stored in such a manner to facilitate memoryaccess, e.g. such that all of the geometry (primitives) to be tested fora given leaf node can in an embodiment be obtained from external (e.g.main) memory in a single memory transaction (burst). For example, in anembodiment, the number of primitives that are associated with a givenleaf node is selected such that all of the primitives stored within ablock of memory can be obtained from external (e.g. main) memory in asingle memory transaction (burst). For example, where a cache system isused, this block size may correspond to a set of one or more (integer)cache lines. This can therefore facilitate an overall more efficientray-primitive intersection testing.

This ray-primitive intersection testing can be done in any suitablefashion as desired. For instance, this may be done by the programmableexecution unit itself or by suitable intersection testing hardware.Where this is performed in hardware, this may be same or differenthardware to the intersection testing circuit that performs theray-volume intersection testing, where this is provided. For instance,in some embodiments, there may be provided dedicated (hardware) circuitsfor both the ray-volume and ray-primitive intersection testing.

Likewise the ray-primitive intersection testing may be performed for agroup of plural rays (which may be the same group of plural rays forwhich the traversal operation was performed, but it would also bepossible to re-group the rays for the subsequent processing after theinitial traversal operation), or may be performed for individual rays.

In whatever manner the ray-primitive intersection testing is performed,the end result of all of this is to determine which geometry (if any) isintersected by which rays.

It should be noted in this regard that while the programmable executionunit will, and in an embodiment does, use the indicated geometry todetermine the geometry that is intersected by a ray, as the ray tracingacceleration data structure traversal only returns an indication ofgeometry that may be intersected by the ray (e.g. that is present in avolume that the ray intersects (pass into/through), it could be that infact the ray will not actually intersect any of the indicated geometry.Thus while the determination of any geometry that is intersected by aray performed by the programmable execution unit may, and typicallywill, result in the identification of geometry that is actuallyintersected by the ray, it could be the case that the intersectiondetermination will in fact determine that there is in fact no geometrythat is intersected by the ray.

In the case that the ray-primitive intersection determination determinesthat there is in fact no geometry that is intersected by the ray (e.g.when the ray tracing acceleration data structure traversal operationreturns a set of primitives, but none of the primitives is actuallyintersected by the ray), then the programmable execution unit in anembodiment treats that as a ray tracing intersection “miss” (asdiscussed above for the situation where the ray tracing accelerationdata structure traversal does not identify any intersection for a ray),and then performs an appropriate “miss” “default” operation accordingly.

The determination of which geometry is intersected by the rays is thenused by graphics processor to continue the processing (raytracing/rendering) operations.

For instance, the operations described above can then be (and are)repeated for other groups of rays for the sampling position, and oncethis is done, the sampling position can then be rendered accordingly,e.g. in the usual way for ray tracing operations.

For any geometry (primitives) that is it determined is actuallyintersected by a ray, various processing steps can then be taken todetermine the effect (e.g. appearance) this should have in the samplingposition for which the ray was cast.

Thus, once the geometry that the rays will actually intersect (if any)has been determined, then the programmable execution unit performsfurther processing for the sampling positions in the frame that the rayscorrespond to in accordance with the (any) geometry for the scenedetermined to be intersected by the ray.

The further processing for a sampling position that is performed in thisregard can comprise any suitable and desired processing for the samplingposition as a result of the ray tracing operation for the ray inquestion, e.g., and in an embodiment, in accordance with and based onany geometry for the scene that was determined to be intersected by theray.

The further processing for a sampling position that is performed as aresult of the ray tracing operation for a ray is in an embodimentdetermined and selected in accordance with and based on the geometry ofthe scene that was determined to be intersected by the ray, and/or inaccordance with and based on the particular ray tracing-based renderingprocess that is being performed (e.g. whether the ray tracing processrequires the casting of secondary rays (where it is appropriate to dothat), and/or the casting of secondary rays of a particular type, orwhether the ray tracing-based rendering is intended to be based solelyon the first intersection point that is determined). For example, thefurther processing could be, and in an embodiment is, based on thedetermined surface type of the geometry that is intersected, and apredefined operation (e.g. in terms of the casting of any secondaryrays) for that surface type.

Other arrangements would, of course, be possible. In an embodiment, thefurther processing for a sampling position that can be (and is)performed in accordance with any geometry for the scene determined to beintersected by a ray corresponding to the sampling position comprisestriggering the casting of a further (e.g. secondary) ray into the scenefor the sampling position in question.

In an embodiment, the further processing for a sampling position in theframe that a ray corresponds to that can be (and is) performed inaccordance with any geometry for the scene determined to be intersectedby the ray also or instead (and in an embodiment also) comprisesrendering (shading) the sampling position for the frame to generate anoutput data value (colour value) for the sampling position, e.g., and inan embodiment, to be used to display the view of the scene at thesampling position for the frame in question.

Thus, in an embodiment, the further processing for a sampling positionin a frame that a ray corresponds to that is performed comprises one of:

triggering the tracing (casting) of a further (e.g. secondary) ray forthe sampling position in question; and

rendering (shading) the sampling position so as to provide an outputcolour value for the sampling position for the frame.

Correspondingly, the technology described herein in an embodimentcomprises shading the sampling position based on the intersection,and/or casting further rays into the scene based on the intersection.

As discussed above, which of these operations is performed is in anembodiment based on and in accordance with a property or properties ofthe geometry that was determined to be intersected by the ray, and theparticular ray tracing-based rendering process that is being used.

The rendering (shading) of the sampling position can be performed in anysuitable and desired manner. In an embodiment, it is performed based onand in accordance with the results of the casting of the ray or rays forthe sampling position, and the determined intersected geometry (if any),and/or based on and in accordance with the particular ray tracing-basedrendering process that is being performed. For example, the rendering(shading) processing could be, and in an embodiment is, based on thedetermined surface type of the geometry that is intersected, and apredefined shading operation for that surface type.

The rendering (shading) in an embodiment takes account of all the raysthat have been cast for a sampling position and so in an embodiment isbased both on the first intersected geometry (and the properties, e.g.surface properties, of that geometry), together with the result of anyfurther (secondary) rays that have been cast for the sampling position,e.g. to determine any lighting, reflection or refraction effects.

Other arrangements would, of course, be possible. In an embodiment, therendering (shading) of the sampling position is performed once all ofthe (desired) rays have been cast for the sampling position (and thegeometry intersections (if any) for all of the rays to be cast for thesampling position in question have been determined). (As discussedabove, the ray tracing process for a given sampling position maycomprise both the determination of any geometry that is intersected by a“primary” ray that has been cast from the sampling position itself,together with the determination of geometry, etc., for any secondaryrays that have been cast for the sampling position in question, e.g. asa result of an intersection or intersections determined for the primaryray.)

Thus, in an embodiment, once the final results of the rays (the geometryintersections (if any)) have been determined for a sampling position,the programmable execution unit will then render the sampling positionin the frame, (at least) in accordance with any geometry for the scenedetermined to be intersected by rays that have been cast for thesampling position.

Again, this can be done in any suitable and desired manner, and can useany suitable and desired properties, etc., of the geometry, etc., thatis determined to be intersected by a ray or rays for the samplingposition.

Once the ray tracing based rendering process has been completed for asampling position, then that will, and in an embodiment does, asdiscussed above, generate an appropriate set of output data for thesampling position, e.g., and in an embodiment, in the form of anappropriate set of colour (e.g. RGB) data, for the sampling position.

This will be done for each sampling position in the frame (thus theoperation in the manner of the technology described herein is in anembodiment performed for plural, and in an embodiment for each, samplingposition of the frame being rendered), so that a final output frameshowing a view of the scene to be rendered will be generated, whichoutput frame can then, e.g., be written out to memory and/or otherwiseprocessed for further use, e.g. for display on a suitable display.

The process may then be repeated for a next frame (e.g. the next frameto be displayed), and so on. In order to perform the ray-primitiveintersection testing and any required subsequent processing, theprogrammable execution unit may, and in an embodiment does, use furtherinformation relating to the geometry (e.g. primitives), such asappropriate attributes of the geometry (e.g. primitives), such as theirvertex positions, normals, surface type/materials), etc.. This may beneeded in order to determine the actual intersection (point), and forperforming further processing in relation to the sampling positionaccordingly.

Thus the process in an embodiment uses information regarding theproperties of the geometry (e.g. in terms of its surface properties, thesurface it belongs to, etc.). This information can be provided in anysuitable and desired manner, but in an embodiment indexes/pointers todata structures where the data relating to the properties of thegeometry is stored are used.

In an embodiment, these properties (additional attributes) are fetchedby the programmable execution unit as appropriate, once an intersectiondetermination has been returned by the ray tracing acceleration datastructure traversal operation (e.g. by, as discussed below, executingfurther program instructions to fetch the required attributes).

It would also or instead be possible, if desired, for the indication ofthe geometry for the scene to be rendered that may be intersected by theray that is returned to the programmable execution unit by the raytracing acceleration data structure traversal operation to, as well asindicating the geometry itself, convey and/or indicate such informationregarding the properties of the geometry, e.g. in the form ofindexes/pointers to data structure(s) where data relating to theproperties of the geometry is stored.

In an embodiment, the ray tracing rendering process supports the use ofplural different geometry models, e.g., and in an embodiment, independence of the distance of the geometry from the viewpoint (camera),and/or from any lighting for the scene, etc., and the ray tracingacceleration data structure traversal operation returns with theindicated geometry an indication of which one of the different modelsshould be used for the geometry.

The technology described herein can be used for all forms of output thata graphics processor may output. Thus, it may be used when generatingframes for display, for render-to-texture outputs, etc. The output fromthe graphics processor is, in an embodiment, exported to external, e.g.main, memory, for storage and use.

Subject to the requirements for operation in the manner of thetechnology described herein, the graphics processor can otherwise haveany suitable and desired form or configuration of graphics processor andcomprise and execute any other suitable and desired processing elements,circuits, units and stages that a graphics processor may contain, andexecute any suitable and desired form of graphics processing pipeline.

In an embodiment, the graphics processor is part of an overall graphics(data) processing system that includes, e.g., and in an embodiment, ahost processor (CPU) that, e.g., executes applications that requireprocessing by the graphics processor. The host processor will sendappropriate commands and data to the graphics processor to control it toperform graphics processing operations and to produce graphicsprocessing output required by applications executing on the hostprocessor. To facilitate this, the host processor should, and, in anembodiment does, also execute a driver for the graphics processor and acompiler or compilers for compiling programs to be executed by theprogrammable execution unit of the graphics processor.

The overall graphics processing system may, for example, include one ormore of: a host processor (central processing unit (CPU)), the graphicsprocessor (processing unit), a display processor, a video processor(codec), a system bus, and a memory controller.

The graphics processor and/or graphics processing system may alsocomprise, and/or be in communication with, one or more memories and/ormemory devices that store the data described herein, and/or the outputdata generated by the graphics processor, and/or store software (e.g.(shader) programs) for performing the processes described herein. Thegraphics processor and/or graphics processing system may also be incommunication with a display for displaying images based on the datagenerated by the graphics processor.

The technology described herein also extends to an overall graphicsprocessing system and the operation of that system.

Thus, another embodiment of the technology described herein comprises amethod of operating a graphics processing system, the graphicsprocessing system including:

a graphics processor comprising:

-   -   a programmable execution unit operable to execute programs to        perform graphics processing operations, and in which a program        can be executed by plural execution threads at the same time;    -   the method comprising:    -   generating a graphics shader program or programs which, when        executed by the programmable execution unit of the graphics        processor, causes the graphics processor to render a frame that        represents a view of a scene comprising one or more objects        using a ray tracing process,    -   wherein the ray tracing process uses a ray tracing acceleration        data structure indicative of the distribution of geometry for        the scene to be rendered to determine geometry for the scene        that may be intersected by a ray being used for a ray tracing        operation, the ray tracing acceleration data structure        comprising a plurality of nodes, each node associated with a        respective one or more volumes within the scene,    -   the ray tracing process comprising performing for a plurality of        rays a traversal of the ray tracing acceleration data structure        to determine, by testing the rays for intersection with the        volumes represented by the nodes of the acceleration data        structure, geometry for the scene to be rendered that may be        intersected by the rays;    -   the generating a graphics shader program or programs which, when        executed by the programmable execution unit of the graphics        processor, causes the graphics processor to render a frame that        represents a view of a scene comprising one or more objects        using a ray tracing process comprising:    -   including in a program to perform a ray tracing acceleration        data structure traversal, wherein the program is to be executed        by a group of plural execution threads, each execution thread in        the group of execution threads performing a traversal operation        for a respective ray in a corresponding group of rays such that        the group of rays performs the traversal operation together, a        set of one or more ray-volume testing instructions for testing        rays for intersection with the one or more volumes associated        with a given node of the ray tracing acceleration data structure        that is to be tested during the traversal operation, which set        of ray-volume testing instructions, when executed by execution        threads of the group of plural execution threads, will cause:    -   the graphics processor to test one or more rays from the group        of plural rays that are performing the traversal operation        together for intersection with the one or more volumes        associated with the node being tested; and    -   a result of the intersection testing to be returned for the node        for the traversal operation;    -   the method further comprising:

providing the generated graphics shader program or programs to thegraphics processor for execution by the programmable execution unit; and

the programmable execution unit of the graphics processor:

executing the graphics shader program or programs to render a frame thatrepresents a view of a scene comprising one or more objects using a raytracing process; and

when a group of execution threads is executing the program or programsfor a corresponding group of rays that are performing a traversal of theray tracing acceleration data structure together, in response to theexecution threads executing the set of one or more ray-volume testinginstructions in respect of a node of the ray tracing acceleration datastructure:

testing one or more rays from the group of plural rays that areperforming the traversal operation together for intersection with theone or more volumes associated with the node being tested; and

returning a result of the intersection testing for the node for thetraversal operation;

the method further comprising:

for each ray in the group of rays performing the traversal operationtogether, determining any geometry that is intersected by the ray; and

performing further processing for a sampling position in the frame thatthe ray corresponds to in accordance with any geometry for the scenedetermined to be intersected by the ray.

Thus, another embodiment of the technology described herein comprises agraphics processing system, the graphics processing system comprising:

a graphics processor comprising:

-   -   a programmable execution unit operable to execute programs to        perform graphics processing operations, and in which a program        can be executed by plural execution threads at the same time;

the graphics processing system further comprising:

a processing circuit configured to:

-   -   generate a graphics shader program or programs which, when        executed by the programmable execution unit of the graphics        processor, causes the graphics processor to render a frame that        represents a view of a scene comprising one or more objects        using a ray tracing process;

the generating a graphics shader program or programs which, whenexecuted by the programmable execution unit of the graphics processor,causes the graphics processor to render a frame that represents a viewof a scene comprising one or more objects using a ray tracing processcomprising:

-   -   including in a program to perform a ray tracing acceleration        data structure traversal, wherein the program is to be executed        by a group of plural execution threads, each execution thread in        the group of execution threads performing a traversal operation        for a respective ray in a corresponding group of rays such that        the group of rays performs the traversal operation together, a        set of one or more ray-volume testing instructions for testing        rays for intersection with the one or more volumes associated        with a given node of the ray tracing acceleration data structure        that is to be tested during the traversal operation, which set        of ray-volume testing instructions, when executed by execution        threads of the group of plural execution threads, will cause:    -   the graphics processor to test one or more rays from the group        of plural rays that are performing the traversal operation        together for intersection with the one or more volumes        associated with the node being tested; and    -   a result of the intersection testing to be returned for the node        for the traversal operation;

the processing circuit being further configured to:

provide the generated graphics shader program or programs to thegraphics processor for execution by the programmable execution unit; and

the programmable execution unit of the graphics processor beingconfigured to:

execute the graphics shader program or programs to render a frame thatrepresents a view of a scene comprising one or more objects using a raytracing process; and

when a group of execution threads is executing the program or programsfor a corresponding group of rays that are performing a traversal of theray tracing acceleration data structure together, in response to theexecution threads executing the set of one or more ray-volume testinginstructions in respect of a node of the ray tracing acceleration datastructure:

the execution unit triggers testing of one or more rays from the groupof plural rays that are performing the traversal of the ray tracingacceleration data structure together for intersection with the one ormore volumes associated with the node being tested, wherein a result ofthe intersection testing is then returned for the node for the traversaloperation;

the programmable execution unit of the graphics processor being furtherconfigured to:

for each ray in the group of rays performing the traversal operationtogether, determine any geometry that is intersected by the ray; and

perform further processing for a sampling position in the frame that theray corresponds to in accordance with any geometry for the scenedetermined to be intersected by the ray.

As will be appreciated by those skilled in the art, these embodiments ofthe technology described herein can, and in an embodiment do, includeany one or more or all of the features of the technology describedherein described herein.

Thus, for example, the shader program or programs that are provided tothe graphics processor for execution (and that are prepared by thecompiler) in an embodiment comprise a first sequence of instructions toperform appropriate graphics processing operations for a raytracing-based rendering process up to and including the traversaloperation, together with one or more sequences of instructions to beexecuted once a response from the traversal operation has been received(and, in an embodiment, to be executed in dependence upon the responsefrom the ray tracing acceleration data structure, such as thegeometry/surface type), which sequences of instructions will, whenexecuted, determine any geometry that is intersected by a ray using thedetermined indication of the geometry returned by the ray tracingacceleration data structure traversal, and then trigger furtherprocessing in respect of a sampling position that the ray corresponds toaccordingly (which further processing in an embodiment may be thecasting of a further ray, and/or the rendering (shading) of the samplingposition that the ray corresponds to).

Other arrangements would, of course, be possible. It will be appreciatedby those skilled in the art that all of the described embodiments of thetechnology described herein can, and in an embodiment do, include, asappropriate, any one or more or all of the features of the technologydescribed herein described herein.

The technology described herein can be implemented in any suitablesystem, such as a suitably configured micro-processor based system. Inan embodiment, the technology described herein is implemented in acomputer and/or micro-processor based system. The technology describedherein is in an embodiment implemented in a portable device, such as,and in an embodiment, a mobile phone or tablet.

The various functions of the technology described herein can be carriedout in any desired and suitable manner. For example, the functions ofthe technology described herein can be implemented in hardware orsoftware, as desired. Thus, for example, unless otherwise indicated, thevarious functional elements, stages, and units of the technologydescribed herein may comprise a suitable processor or processors,controller or controllers, functional units, circuitry, circuits,processing logic, microprocessor arrangements, etc., that are operableto perform the various functions, etc., such as appropriately dedicatedhardware elements (processing circuitry/circuits), and/or programmablehardware elements (processing circuitry/circuits) that can be programmedto operate in the desired manner.

It should also be noted here that, as will be appreciated by thoseskilled in the art, the various functions, etc., of the technologydescribed herein may be duplicated and/or carried out in parallel on agiven processor. Equally, the various processing stages, etc., may shareprocessing circuitry/circuits, etc., if desired.

The methods in accordance with the technology described herein may beimplemented at least partially using software e.g. computer programs. Itwill thus be seen that when viewed from further embodiments thetechnology described herein provides computer software specificallyadapted to carry out the methods herein described when installed on adata processor, a computer program element comprising computer softwarecode portions for performing the methods herein described when theprogram element is run on a data processor, and a computer programcomprising code adapted to perform all the steps of a method or of themethods herein described when the program is run on a data processingsystem. The data processor may be a microprocessor system, aprogrammable FPGA (field programmable gate array), etc.

The technology described herein also extends to a computer softwarecarrier comprising such software which when used to operate a displayprocessor, or microprocessor system comprising a data processor causesin conjunction with said data processor said controller or system tocarry out the steps of the methods of the technology described herein.Such a computer software carrier could be a physical storageintermediate such as a ROM chip, CD ROM, RAM, flash memory, or disk, orcould be a signal such as an electronic signal over wires, an opticalsignal or a radio signal such as to a satellite or the like.

It will further be appreciated that not all steps of the methods of thetechnology described herein need be carried out by computer software andthus from a further broad embodiment the technology described hereinprovides computer software and such software installed on a computersoftware carrier for carrying out at least one of the steps of themethods set out herein.

The technology described herein may accordingly suitably be embodied asa computer program product for use with a computer system. Such animplementation may comprise a series of computer readable instructionseither fixed on a tangible, non-transitory intermediate, such as acomputer readable intermediate, for example, diskette, CD ROM, ROM, RAM,flash memory, or hard disk. It could also comprise a series of computerreadable instructions transmittable to a computer system, via a modem orother interface device, over either a tangible intermediate, includingbut not limited to optical or analogue communications lines, orintangibly using wireless techniques, including but not limited tomicrowave, infrared or other transmission techniques. The series ofcomputer readable instructions embodies all or part of the functionalitypreviously described herein.

Those skilled in the art will appreciate that such computer readableinstructions can be written in a number of programming languages for usewith many computer architectures or operating systems. Further, suchinstructions may be stored using any memory technology, present orfuture, including but not limited to, semiconductor, magnetic, oroptical, or transmitted using any communications technology, present orfuture, including but not limited to optical, infrared, or microwave. Itis contemplated that such a computer program product may be distributedas a removable intermediate with accompanying printed or electronicdocumentation, for example, shrink wrapped software, preloaded with acomputer system, for example, on a system ROM or fixed disk, ordistributed from a server or electronic bulletin board over a network,for example, the Internet or World Wide Web.

Embodiments of the technology described herein will now be described byway of example only and with reference to the accompanying drawings.

The present embodiments relate to the operation of a graphics processor,e.g. in a graphics processing system as illustrated in FIG. 1 , whenperforming rendering of a scene to be displayed using a ray tracingbased rendering process.

Ray tracing is a rendering process which involves tracing the paths ofrays of light from a viewpoint (sometimes referred to as a “camera”)back through sampling positions in an image plane (which is the framebeing rendered) into a scene, and simulating the effect of theinteraction between the rays and objects in the scene. The output datavalue e.g. colour of a sampling position in the image is determinedbased on the object(s) in the scene intersected by the ray passingthrough the sampling position, and the properties of the surfaces ofthose objects. The ray tracing process thus involves determining, foreach sampling position, a set of objects within the scene which a raypassing through the sampling position intersects.

FIG. 2 illustrates an exemplary “full” ray tracing process. A ray 20(the “primary ray”) is cast backward from a viewpoint 21 (e.g. cameraposition) through a sampling position 22 in an image plane (frame) 23into the scene that is being rendered. The point 24 at which the ray 20first intersects an object 25, e.g. a primitive (which primitives in thepresent embodiments are in the form of triangles, but may also compriseother suitable geometric shapes), in the scene is identified. This firstintersection will be with the object in the scene closest to thesampling position.

A secondary ray in the form of shadow ray 26 may be cast from the firstintersection point 24 to a light source 27. Depending upon the materialof the surface of the object 25, another secondary ray in the form ofreflected ray 28 may be traced from the intersection point 24. If theobject is, at least to some degree, transparent, then a refractedsecondary ray may be considered.

Such casting of secondary rays may be used where it is desired to addshadows and reflections into the image. A secondary ray may be cast inthe direction of each light source (and, depending upon whether or notthe light source is a point source, more than one secondary ray may becast back to a point on the light source).

In the example shown in FIG. 2 , only a single bounce of the primary ray20 is considered, before tracing the reflected ray back to the lightsource. However, a higher number of bounces may be considered ifdesired.

The output data for the sampling position 22 i.e. a colour value (e.g.RGB value) thereof, is then determined taking into account theinteractions of the primary, and any secondary, ray(s) cast, withobjects in the scene. The same process is conducted in respect of eachsampling position to be considered in the image plane (frame) 23.

In order to facilitate such ray tracing processing, in the presentembodiments acceleration data structures indicative of the geometry(e.g. objects) in scenes to be rendered are used when determining theintersection data for the ray(s) associated with a sampling position inthe image plane to identify a subset of the geometry which a ray mayintersect.

The ray tracing acceleration data structure represents and indicates thedistribution of geometry (e.g. objects) in the scene being rendered, andin particular the geometry that falls within respective (sub-)volumes inthe overall volume of the scene (that is being considered). In thepresent embodiments, ray tracing acceleration data structures in theform of Bounding Volume Hierarchy (BVH) trees are used.

FIG. 3 shows an exemplary BVH tree 30, constructed by enclosing thecomplete scene in an axis-aligned bounding volume (AABV), e.g. a cube,and then recursively subdividing the bounding volume into successivesub-AABVs according to any suitable and desired, and, e.g. various,subdivision schemes (e.g. same number of objects per child, based ontraversal cost, etc.), until a desired smallest subdivision (volume) isreached.

In this example, the BVH tree 30 is a wide tree wherein each boundingvolume is subdivided into up to six sub-AABVs. However, in general, anyother suitable tree structure may be used, and a given node of the treemay have any suitable and desired number of child nodes.

Thus, each node in the BVH tree 30 will have a respective volume of thescene being rendered associated with it, with the end, leaf nodes 31each representing a particular smallest subdivided volume of the scene,and any parent node representing, and being associated with, the volumeof its child nodes. Each leaf node will also correspondingly beassociated with the geometry defined for the scene that falls, at leastin part, within the volume that the leaf node corresponds to (e.g. whosecentroid falls within the volume in question), with the leaf nodes 31representing unique (non-overlapping) subsets of primitives defined forthe scene falling within the corresponding volumes for the leaf nodes31. The BVH tree acceleration data structure also stores (either for thenodes themselves or otherwise, e.g. as sideband information),appropriate information to allow the tree to be traversedvolume-by-volume on the basis of the origin and direction of a ray so asto be able to identify a leaf node representing a volume that the raypasses through.

This then allows and facilitates testing a ray against the hierarchy ofbounding volumes in the BVH tree until a leaf node is found. It is thenonly necessary to test the geometry associated with the particular leafnode for intersection with the ray.

FIG. 4 is a flow chart showing the overall ray tracing process inembodiments of the technology described herein, and that will beperformed on and by the graphics processor 2.

First, the geometry of the scene is analysed and used to obtain anacceleration data structure (step 40), for example in the form of a BVHtree structure, as discussed above. This can be done in any suitable anddesired manner, for example by means of an initial processing pass onthe graphics processor 2.

A primary ray is then generated, passing from a camera through aparticular sampling position in an image plane (frame) (step 41). Theacceleration data structure is then traversed for the primary ray (step42), and the leaf node corresponding to the first volume that the raypasses through which contains geometry which the ray potentiallyintersects is identified. It is then determined whether the rayintersects any of the geometry, e.g. primitives, (if any) in that leafnode (step 43).

If no (valid) geometry which the ray intersects can be identified in thenode, the process returns to step 42, and the ray continues to traversethe acceleration data structure and the leaf node for the next volumethat the ray passes through which may contain geometry with which theray intersects is identified, and a test for intersection performed atstep 43.

This is repeated for each leaf node that the ray (potentially)intersects, until geometry that the ray intersects is identified. Whengeometry that the ray intersects is identified, it is then determinedwhether to cast any further (secondary) rays for the primary ray (andthus sampling position) in question (step 44). This may be based, e.g.,and in an embodiment, on the nature of the geometry (e.g. its surfaceproperties) that the ray has been found to intersect, and the complexityof the ray tracing process being used. Thus, as shown in FIG. 4 , one ormore secondary rays may be generated emanating from the intersectionpoint (e.g. a shadow ray(s), a refraction ray(s) and/or a reflectionray(s), etc.). Steps 42, 43 and 44 are then performed in relation toeach secondary ray.

Once there are no further rays to be cast, a shaded colour for thesampling position that the ray(s) correspond to is then determined basedon the result(s) of the casting of the primary ray, and any secondaryrays considered (step 45), taking into account the properties of thesurface of the object at the primary intersection point, any geometryintersected by secondary rays, etc. The shaded colour for the samplingposition is then stored in the frame buffer (step 46).

If no (valid) node which may include geometry intersected by a given ray(whether primary or secondary) can be identified in step 42 (and thereare no further rays to be cast for the sampling position), the processmoves to step 45, and shading is performed. In this case, the shading isin an embodiment based on some form of “default” shading operation thatis to be performed in the case that no intersected geometry is found fora ray. This could comprise, e.g., simply allocating a default colour tothe sampling position, and/or having a defined, default geometry to beused in the case where no actual geometry intersection in the scene isfound, with the sampling position then being shaded in accordance withthat default geometry. Other arrangements would, of course, be possible.

This process is performed for each sampling position to be considered inthe image plane (frame). FIG. 5 shows an alternative ray tracing processwhich may be used in embodiments of the technology described herein, inwhich only some of the steps of the full ray tracing process describedin relation to FIGS. 3 and 4 are performed. Such an alternative raytracing process may be referred to as a “hybrid” ray tracing process.

In this process, as shown in FIG. 5 , the first intersection point 50for each sampling position in the image plane (frame) is insteaddetermined first using a rasterisation process and stored in anintermediate data structure known as a “G-buffer” 51. Thus, the processof generating a primary ray for each sampling position, and identifyingthe first intersection point of the primary ray with geometry in thescene, is replaced with an initial rasterisation process to generate the“G-buffer”. The G-buffer includes information indicative of the depth,colour, normal and surface properties (and any other appropriate anddesired data, e.g. albedo, etc.) for each first (closest) intersectionpoint for each sampling position in the image plane (frame).

Secondary rays, e.g. shadow ray 52 to light source 53, and reflectionray 54, may then be cast starting from the first intersection point 50,and the shading of the sampling positions determined based on theproperties of the geometry first intersected, and the interactions ofthe secondary rays with geometry in the scene.

Referring to the flowchart of FIG. 4 , in such a hybrid process, theinitial pass of steps 41, 42 and 43 of the full ray tracing process fora primary ray will be omitted, as there is no need to cast primary raysand determine their first intersection with geometry in the scene. Thefirst intersection point data for each sampling position is insteadobtained from the G-buffer.

The process may then proceed to the shading stage 45 based on the firstintersection point for each pixel obtained from the G-buffer, or wheresecondary rays emanating from the first intersection point are to beconsidered, these will need to be cast in the manner described byreference to FIG. 4 . Thus, steps 42, 43 and 44 will be performed in thesame manner as previously described in relation to the full ray tracingprocess for any secondary rays.

The colour determined for a sampling position will be written to theframe buffer in the same manner as step 46 of FIG. 4 , based on theshading colour determined for the sampling position based on the firstintersection point (as obtained from the G-buffer), and, whereapplicable, the intersections of any secondary rays with objects in thescene, determined using ray tracing.

The present embodiments relate in particular to the operation of agraphics processor when performing ray tracing-based rendering, e.g. asdescribed above with reference to FIGS. 2-4 , and in particular to theray tracing acceleration data structure traversal and geometryintersection (steps 42-43 in FIG. 4 ) performed as part of the raytracing operation.

FIG. 6 shows schematically the relevant elements and components of agraphics processor (GPU) 60 of the present embodiments. As shown in FIG.6 , the GPU 60 includes one or more shader (processing) cores 61, 62together with a memory management unit 63 and a level 2 cache 64 whichis operable to communicate with an off-chip memory system 68 (e.g. viaan appropriate interconnect and (dynamic) memory controller).

FIG. 6 shows schematically the relevant configuration of one shader core61, but as will be appreciated by those skilled in the art, any furthershader cores of the graphics processor 60 will be configured in acorresponding manner.

(The graphics processor (GPU) shader cores 61, 62 are programmableprocessing units (circuits) that perform processing operations byrunning small programs for each “item” in an output to be generated suchas a render target, e.g. frame. An “item” in this regard may be, e.g. avertex, one or more sampling positions, etc. The shader cores willprocess each “item” by means of one or more execution threads which willexecute the instructions of the shader program(s) in question for the“item” in question. Typically, there will be multiple execution threadseach executing at the same time (in parallel).)

FIG. 6 shows the main elements of the graphics processor 60 that arerelevant to the operation of the present embodiments. As will beappreciated by those skilled in the art there may be other elements ofthe graphics processor 60 that are not illustrated in FIG. 6 . It shouldalso be noted here that FIG. 6 is only schematic, and that, for example,in practice the shown functional units may share significant hardwarecircuits, even though they are shown schematically as separate units inFIG. 6 . It will also be appreciated that each of the elements andunits, etc., of the graphics processor as shown in FIG. 6 may, unlessotherwise indicated, be implemented as desired and will accordinglycomprise, e.g., appropriate circuits (processing logic), etc., forperforming the necessary operation and functions.

As shown in FIG. 6 , each shader core of the graphics processor 60includes an appropriate programmable execution unit (execution engine)65 that is operable to execute graphics shader programs for executionthreads to perform graphics processing operations.

The shader core 61 also includes an instruction cache 66 that storesinstructions to be executed by the programmable execution unit 65 toperform graphics processing operations. The instructions to be executedwill, as shown in FIG. 6 , be fetched from the memory system 68 via aninterconnect 69 and a micro-TLB (translation lookaside buffer) 70.

The shader core 61 also includes an appropriate load/store unit 76 incommunication with the programmable execution unit 65, that is operable,e.g., to load into an appropriate cache, data, etc., to be processed bythe programmable execution unit 65, and to write data back to the memorysystem 68 (for data loads and stores for programs executed in theprogrammable execution unit). Again, such data will be fetched/stored bythe load/store unit 76 via the interconnect 69 and the micro-TLB 70.

In order to perform graphics processing operations, the programmableexecution unit 65 will execute graphics shader programs (sequences ofinstructions) for respective execution threads (e.g. corresponding torespective sampling positions of a frame to be rendered).

Accordingly, as shown in FIG. 6 , the shader core 61 further comprises athread creator (generator) 72 operable to generate execution threads forexecution by the programmable execution unit 65.

As shown in FIG. 6 , the shader core 61 also includes an intersectiontesting circuit 74, which is in communication with the programmableexecution unit 65, and which is operable to perform the requiredray-volume testing during the ray tracing acceleration data structuretraversals (i.e. the operation of step 42 of FIG. 4 ) for rays beingprocessed as part of a ray tracing-based rendering process, in responseto messages 75 received from the programmable execution unit 65.

In the present embodiments the intersection testing circuit 74 is alsooperable to perform the required ray-primitive testing (i.e. theoperation of step 43 of FIG. 4 ). The intersection testing circuit 74 isalso able to communicate with the load/store unit 76 for loading in therequired data for such intersection testing.

In the present embodiments, the intersection testing circuit 74 of thegraphics processor is a (substantially) fixed-function hardware unit(circuit) that is configured to perform the required ray-volume andray-primitive intersection testing during a traversal of a ray tracingacceleration data structure to determine geometry for a scene to berendered that may be (and is) intersected by a ray being used for a raytracing operation.

FIG. 7 shows in more detail the communication between the intersectiontesting circuit 74 and the shader cores 61, 62. As shown in FIG. 7 , inthe present embodiments, the intersection testing circuit 74 includesrespective hardware circuits for performing the ray-volume testing(RT_RAY_BOX) 77 and for performing the ray-primitive testing(RT-RAY-TRI) 75. The shader cores 61, 62 thus contain appropriatemessage blocks 614, 616, 624, 626 for messaging the respectiveray-volume testing circuit 77 and ray-primitive testing circuit 75accordingly when it is desired to perform intersection testing during atraversal operation.

As also shown in FIG. 7 , these message blocks communicate withrespective register files 612, 622 of the shader cores 61, 62 so thatthe result of the intersection testing can be written to the registerfiles. In particular, in the present embodiments the traversal operationis managed using a traversal stack that is maintained in a set of sharedregister files for a group of plural execution threads (a warp)processing rays that are performing the traversal operation.

FIG. 8 shows the stack layout in the present embodiments. As shown inFIG. 8 , the traversal stack includes a list of entries 80. Each entryis associated with an indication of the next node address to be tested,e.g. in the form of a suitable pointer to the next node address 83. Theleaf count 82 field is used to track whether the node corresponds to aleaf node or an internal node and hence whether to trigger ray-volume orray-primitive testing. Another field 81 is provided that indicates whichrays in the group of rays performing the traversal together should betested for the node in question.

As mentioned above, the traversal stack is in the present embodimentsmanaged for the group of rays as a whole, via a set of shared registersallocated for the execution threads processing the rays. This cantherefore help reduce memory bandwidth since the traversal stack can bemanaged for the group as a whole locally to the graphics processor.

FIG. 9 is a flowchart showing the operation of a shader core 61 of thegraphics processor 60 when performing a ray tracing-based renderingprocess to render a view of the scene in an embodiment of the technologydescribed herein.

FIG. 9 shows the operation in respect of a given sampling position ofthe frame being rendered. This operation will be repeated for eachsampling position of the frame being rendered, and by each respectiveshader core that is active and being used to render the frame.

As discussed above, in the present embodiments, sampling positions arerendered by generating respective execution threads for the samplingpositions and then executing appropriate shader programs for thosethreads. Thus, the process will start with the thread creator 72generating an appropriate execution thread corresponding to the samplingposition that is being rendered. The execution thread will then executean initial ray tracing shader program to perform the ray tracing-basedrendering process for the sampling position.

In the present embodiments, the initial ray tracing shader program thatis executed for a sampling position will, inter alia, include one ormore instructions that when executed trigger the programmable executionunit 65 to send a message 75 to the intersection testing circuit 74 toperform the required ray-volume or ray-primitive intersection testingbetween the ray in question and a given node of the BVH tree to betested against.

In the present embodiments, the shader program is executed by a group ofplural execution threads (e.g. a warp), with each execution threadperforming the traversal operation for a respective ray in a group ofplural rays that are thereby caused to perform the traversal operationtogether, as a whole. To facilitate this, the shader program to performthe traversal operation may include an initial instruction that ensures(forces) all of the execution threads in the group of execution threadsto be in an ‘active’ state, e.g. such that the traversal operation canthen be performed using the execution thread group as a whole, e.g. inSIMD execution state.

Thus, as shown in FIG. 9 , when, during execution of the initial raytracing shader program for a sampling position, the programmableexecution unit 65 encounters and executes such an ‘Enter_SIMD_state”instruction (step 90), at this point it can be ensured that all of theexecution threads in the group of execution threads executing theprogram are in an active (SIMD) state.

The traversal stack that is maintained for the group of executionthreads can then be suitably initialised for the traversal operation(step 91).

The first entry in the traversal stack (e.g. the root node) is thenpopped from the stack in order to start the traversal operation (step92).

At this point the root node will be the only entry in the traversalstack, such that there will be no stack underflow (step 93—No) and theshader program then proceeds to determine whether the node is leaf nodeor an internal node (step 94).

For the root node, and other internal nodes encountered during thetraversal operation, it is then necessary to perform the requiredray-volume intersection testing to determine whether the node representsany geometry that may be intersected by a ray in the group of rays thatare performing the traversal operation together. This is done byincluding into the shader program an appropriate ray-volume testinginstruction (‘RT_RAY_BOX’) that when executed (step 95) by the executionunit will trigger the execution unit to message the ray-volumeintersection testing circuit 77 of the intersection testing circuit 74to perform the desired ray-volume testing.

FIG. 10 is a flowchart showing a ray-volume intersection testingoperation according to an embodiment of the technology described herein.As shown in FIG. 10 , when the ray-volume testing instruction(‘RT_RAY_BOX’) is executed in respect of a given node in the BVH tree, afirst ray in the group of plural rays performing the traversal operationthat need to be tested for intersection with the node (as indicated inthe appropriate field 81 in the traversal stack) is selected (step 951),and this is then iteratively tested against each child node volumeassociated with the node in question (step 952). The child node volumescan be obtained in any suitable and desired manner. In embodiments, thechild node volumes associated with a particular node are stored in anencoded manner, as will be described further below.

Thus, for each child node volume, it is determined whether the rayintersects with the child volume (step 953), and if the ray doesintersect, a hit mask for the child node (field 81 in FIG. 8 ) is setaccordingly to reflect this. If the ray does not intersect the firstchild node volume, the ray is then tested against the next child nodevolume, and so on, until the iteration for that ray over the child nodevolumes is finished (step 955). The testing then iterates over the raysthat are to be tested against the node until all of the rays have beentested against all of the child node volumes (step 956).

For each child node volume that was intersected, a result of theintersection testing is then returned, with an appropriate entry beingpushed to the traversal stack such that the child node can then betested accordingly (step 957).

As part of this, it is first tested whether the pushing of the resultsof the intersection testing would cause the traversal stack to overflow,i.e. because the stack is full (step 958). So long as there areavailable entries in the traversal stack (step 958—No) a suitable entryis then pushed to the traversal stack, with the entry including the hitmask (field 81 in FIG. 8 ) for the child node, as well as the leaf countand indication of the child node (fields 82 and 83 in FIG. 8 ).

For instance, it is then determined whether the child node is a leafnode (step 960). If the node is not a leaf node, a node index can thenbe calculated indicating which child nodes are associated with the node(step 961) and pushed to the traversal stack accordingly. On the otherhand, if the child is a leaf node, the leaf size is then calculated(step 964), and an appropriate leaf index calculated indicating whichprimitives are represented by the leaf node (step 955) which is thenpushed to the traversal stack.

This is done for each child node that was determined as beingintersected by a ray (step 963) until respective entries for each childnode have been appropriately added into the traversal stack.

The result of the intersection testing is then returned accordingly, andpushed to the traversal stack for the traversal operation. In the eventthat the result of the intersection testing overflows the traversalstack (step 96—Yes), the entire traversal stack is then pushed to memory(step 97), and an indication of this is recorded into the traversalstack. This can then be checked (at step 93) and in the event that therehas been an overflow event (step 93—Yes), it is then checked whether thestack can be loaded from memory (step 103), and if so the stack is thenloaded in appropriately (step 104), and the stack entries popped (step92) so that the traversal operation can continue.

On the other hand, if the stack cannot be loaded from memory, for anyreason, in that case the traversal operation may be done (step 106),with the execution thread group first exiting the SIMD state (step 105)accordingly.

The traversal stack can thus be worked through in order to test thevarious nodes of the BVH tree to determine which nodes representgeometry that may be intersected by the rays in the group of raysperforming the traversal operation together.

When the traversal operation reaches a leaf node at the end of givenbranch of the BVH tree, such that it is determined that the node is leafnode (at step 94), with the traversal operation therefore indicatingthat the leaf node represents geometry that may be intersected by a ray,the actual geometry intersections are then determined.

This can be done in various ways but in the present embodiments this isdone by including into the shader program an appropriate instruction(‘RT_RAY_TRI’) that when executed (step 98) by the execution unit willtrigger the execution unit to message the ray-primitive intersectiontesting circuit 75 of the intersection testing circuit 74 to perform thedesired ray-primitive testing.

FIG. 11 is a flowchart showing a ray-primitive intersection testingoperation according to an embodiment of the technology described herein.

As shown in FIG. 11 , in response to executing the ray-primitiveintersection testing (‘RT_RAY_TRI’) instruction in respect of a leafnode, the set of primitives (e.g. triangles) represented by the leafnode are then loaded for testing.

For each primitive (triangle) represented by the leaf node (step 981),the rays that were determined to intersect the leaf node volume (asindicated by the hit mask, field 81 in FIG. 8 ) are then iterativelytested against the primitive (step 982) to determine whether or not theray hits the primitive (step 983). If there are no hits, the next ray isthen tested (step 985), and so on, until all of the rays have beentested against the primitive.

For any hits, it is then determined whether there is an ‘opaque’ hit(step 984). If the ray hits opaque geometry, the ray does not need topropagated further, and so the range can then be updated accordingly(step 986). It can then be determined whether the ray is flagged toterminate on the first hit (step 987). If yes, the hit mask (field 81 inFIG. 8 ) can be updated appropriately (step 988) and the testing canthen move on the next ray.

Once all of the rays have been tested against the (first) primitive, itis then determined whether there were any non-opaque hits (step 989).For any rays that are determined to hit a ‘non-opaque’ primitive, theray-primitive testing may need to terminate early, e.g., with the resultbeing returned to the shader program accordingly, such that the shaderprogram can determine how to handle the non-opaque hit (i.e. whether ornot the hit needs to be counted). Thus, in the event that there are anynon-opaque hits, the ray-primitive testing may be terminated early(without testing any more primitives), with the traversal state beingupdated accordingly (step 991). In that case, the ray-primitiveintersection testing is terminated for all of the rays, such that thegroup of rays remains together for the traversal operation.

Otherwise, if there are no non-opaque hits, the ray-primitiveintersection testing moves on to testing the next primitive (step 990),and iteratively tests the rays in the group of rays for intersectionwith that primitive, and so on, until all of the primitive for the leafnode have been tested. Once the ray-primitive intersection testing hasfinished, the traversal state can thus be updated accordingly with theresult of the intersection testing (step 991), and the operation is thendone (step 992).

The traversal operation thus uses the information provided about therays to traverse the ray tracing acceleration data structure todetermine geometry for the scene to be rendered that may be intersectedby the ray in question. In the present embodiments, the traversalprocess operates to traverse the ray tracing acceleration data structurebased on the position and direction of the ray, to determine for eachvolume of the scene that the ray passes through in turn, whether thereis any geometry in the volume (indicated by the ray tracing accelerationdata structure), until a first (potential) intersection with geometrydefined for the scene is found for the ray.

Other arrangements would, of course, be possible.

The ray tracing acceleration data structure traversal for a ray cancomprise traversing a single ray tracing acceleration data structure forthe ray, or traversing plural ray tracing acceleration data structuresfor the ray (e.g. in the case where the overall volume of, and/orgeometry for, the scene is represented by plural different ray tracingacceleration data structures, and/or where an initial ray tracingacceleration data structure that indicates further ray tracingacceleration data structures to be traversed is first traversed).

Once the ray tracing acceleration data structure traversal operation 74has performed the necessary traversal or traversals for a ray, anddetermined geometry that is intersected by the ray, that information isreturned to the programmable execution unit 65, for the programmableexecution unit to perform further processing for the sampling positionin question as a result of, and based on, the result of the determinedtraversal for the ray.

For instance, in the present embodiments, the programmable executionunit 65 may then execute further “surface processing” shader programsthat will perform further processing for the sampling position inquestion based on the result of the ray tracing acceleration datastructure traversal for the ray.

In the present embodiments, there are plural different sets of further“surface processing” shader programs that can be executed, in dependenceupon the type of geometry that has been determined by the ray tracingacceleration data structure traversal circuit as being intersected by aray (and in particular in dependence upon the particular surface type(surface property or properties) of the geometry determined by the raytracing acceleration data structure traversal circuit).

Thus the process operates to select the further “processing” shaderprogram to be executed to perform further processing for the samplingposition corresponding to a ray in accordance with the type of geometry(and in particular the surface type), that has been determined by theray tracing acceleration data structure traversal circuit as beingintersected by the ray.

In order to perform and control this operation, in the presentembodiments, the ray tracing acceleration data structure traversalcircuit triggers the generation of an execution thread that is toexecute (and that executes) the selected further “surface processing”shader program for the geometry type in question.

The programmable execution unit 65 then executes the selected furthershader program for the generated thread (e.g. step 45 in FIG. 4 ).

Once the final output value for the sampling position in question hasbeen generated, the processing in respect of that sampling position iscompleted. A next sampling position may then be processed in a similarmanner, and so on, until all the sampling positions for the frame havebeen appropriately shaded. The frame may then be output, e.g. fordisplay, and the next frame to be rendered processed in a similarmanner, and so on.

As will be appreciated from the above, the ray tracing based renderingprocess of the present embodiments involves, inter alia, theprogrammable execution unit 65 of the graphics processor 60 executingappropriate shader programs to perform the ray tracing-based rendering.In the present embodiments, these shader programs are generated by acompiler (the shader compiler) 12 for the graphics processor 60, e.g.that is executing on a central processing unit (CPU), such as a hostprocessor, of the graphics processing system (and in an embodiment aspart of the driver 11 operation for the graphics processor).

The compiler (driver) will receive the high level ray tracing-basedrendering shader program or programs to be executed from the application13 that requires the ray tracing-based rendering, and then compile thatprogram or programs into appropriate shader programs for execution bythe graphics processor, and, as part of this processing, will, asdiscussed above, include in one or more of the compiled shader programsto be executed by the graphics processor, appropriate ‘ray-volume’ and‘ray-primitive’ intersection testing instructions to cause theprogrammable execution unit to send a message to the intersectiontesting circuit 74 to perform the desired intersection testing.

The compilation process (the compiler) can use any suitable and desiredcompiler techniques for this. FIG. 12 shows an embodiment of thecompilation process. As shown in FIG. 12 , the compiler for the graphicsprocessor will receive a ray tracing-based rendering program or programsfor compiling (step 1100). The compiler will then analyse the shaderprogram code that is provided, to identify instances of requiredintersection testing during the ray traversal operations in that shaderprogram code (step 1101), and to insert corresponding instruction(s) atthe appropriate point(s) in the compiled shader program(s) (step 1102).

The required “surface processing” operations for the intersectedgeometry can also be identified (step 1103) and respective “surfaceprocessing” shader programs compiled (step 1104).

The compiled shader programs will then be issued to the graphicsprocessor for execution (e.g. stored in appropriate memory of and/oraccessible to the graphics processor, so that the graphics processor canfetch the required shader programs for execution as required) (step1105).

It can be seen from the above that the technology described herein, inits embodiments at least, can provide a more efficient process forperforming ray tracing-based rendering. This is achieved, in theembodiments of the technology described herein at least, by using anintersection testing circuit to perform ray-volume intersection testingfor rays being processed, but with other processing for the raytracing-based rendering being performed by executing an appropriateshader program or programs using a programmable execution unit of thegraphics processor.

As mentioned above, in embodiments, the child node volume data is storedin an encoded manner to facilitate more efficient memory access. This inturn can further improve the efficiency of the overall process forperforming ray tracing-based rendering.

For example, the main (e.g. off-chip) memory in an embodiment isconfigured to access data in fixed bursts/blocks of data, for example64-byte naturally aligned blocks of data, to maximise memory accessefficiency. The graphics processor cache memory (where a cache system isused), and cache line size is similarly arranged to fetch blocks of datain this manner. Using the same memory access size throughout the memorysystem can be more efficient. To maximise memory access efficiency, theplural child node volumes associated with a given parent (non-leaf) nodemay therefore be stored in a single memory “block”, which is aligned tothe memory access size.

An axis-aligned bounding volume (AABV) can be defined using (only) twovertices, in particular the bottom-left (x_low, y_low, z_low) andtop-right (x_high, y_high, z_high) vertices. If each value is expressedas a 32-bit IEEE floating point number, each vertex (x,y,z) can then bespecified using 12 bytes of data and a bounding volume uses 24 bytes ofdata. Using vertices expressed in this manner a 64 byte cache line wouldthus only store two child bounding volumes.

In embodiments, therefore, the child node vertices are stored in anencoded manner in which the vertices are stored relative to the parentvolume bottom-left vertex and are compressed (quantised) andre-structured to reduce the amount of data that is required to store abounding volume. For example, in embodiments, an internal (non-leaf)node in the bounding volume hierarchy comprises six child boundingvolumes. By storing each of the six child bounding volumes associatedwith the node in a single memory block, the multiple rays in the groupof plural rays that are performing the traversal at the same time can betested against the six child nodes in one processing instance, thusreducing the number of memory access operations.

To achieve this, in embodiments, the node volume data for a parent(non-leaf) node is stored in memory as follows.

First, one of the parent vertices, for example the minimum (bottom-left)vertex, is stored, e.g. in 24-bit non-IEEE floating point compliantformat.

For each child node for which data is to be stored, there is stored abounding volume and a node type. The node type indicates whether thenode is a leaf node, an internal node or if the node is not used.

The bounding volume is stored in an encoded manner. In particular, thechild upper (top-right) and lower (bottom-left) vertices are storedrelative to the lower (bottom-left) co-ordinate of the parent boundingvolume and quantised to 8-bits. When quantising the child volumeco-ordinates the data is rounded conservatively so that the childbounding volumes can only become bigger as a result of the quantisation.

The quantised child co-ordinates are then scaled, with a separatescaling factor for each axis. The scaling factor is used for all childvertices.

Two portions of data are therefore stored for use in determining thechild volume vertices; a scaling factor (which is common to all of thechild nodes) and a set of quantised ‘base’ co-ordinates (which is uniqueto a child node).

The size of the quantised step can be determined from the size (inx,y,z) of the parent bounding volume, as follows (in GLSL pseudocode):

(parentSize)const float steps_per_play=pow(2.0,8.0)−1.0;

Here, the scaling factor is stored as an 8-bit quantity, so there are255 steps (i.e. 2{circumflex over ( )}8−1).

The scaling factors can then be determined as follows:

scale=pow(vec3(2.0),max(vec3(−127.0),ceil(log2(parentSize/steps_per_play))));

For example, looking at a specific axis, the 8-bit scaling factor forthat axis can be determined as:

scaling_factor_for_axis=(parentSize_for_the_axis)/255,

where the maximum value is −127.

As mentioned above, there are two vertices to define the child boundingbox, one for the lower co-ordinate and the other for the upperco-ordinate. The base co-ordinate values (i.e. the stored/compressedco-ordinates) for each child node, for each axis, can thus be calculatedas follows:

-   -   For the child lower co-ordinate:

floor((childMin_for_axis−parentMin_for_axis)/scaling_factor_for_axis);

-   -   For the child upper co-ordinate:

ceil((childMax_for_axis−parentMin_for_axis)/scaling_factor_for_axis);

where ‘parentMin_for_axis’ is the minimum value along the axis for theparent bounding volume and where ‘childMin_for_axis’ and‘childMax_for_axis’ are the lower and upper co-ordinates of the childbounding volume along the axis. Note that conservative rounding isperformed using the floor/ceiling functions to ensure the quantisationcan only result in the bounding volumes becoming larger.

This encoding is performed for the co-ordinate values for each childnode for each of the axes in order to determine the suitable set of baseco-ordinate values that are to be stored for the child node. When thechild node volume is required, the data must therefore be suitably bedecoded, by reversing the encoding described above, i.e. by re-scalingthe base values using the appropriate scaling factors to recover the‘true’ values, as follows:

modified_co-ordinate_for_axis=base_co-ordinate_for_axis*scaling_factor_for_axis+parentMin_for_axis

where ‘modified_co-ordinate_for_axis’ is the actual co-ordinate that isused when determining the child node volume, and where‘base_co-ordinate_for_axis’ is the stored/compressed co-ordinate value.

This encoding can facilitate more efficient storing of the childbounding volume data in memory. For instance, an example data structurestoring the child bounding volumes for a given internal (non-leaf) nodeaccording to an embodiment is illustrated in FIG. 13 .

In particular, FIG. 13 shows a 64 byte data structure 1300 which isarranged as 16 lines with each line able to store 32 bits of data, suchthat the data structure is aligned with the size of the cache lines andmemory transactions (i.e. 64 bytes). In FIG. 13 , child bounding volumedata is stored for six different child nodes. Thus, in this embodiment,as shown in FIG. 13 , all of the data for the six child nodes can easilyfit within a single 64 byte data structure, such that the six childbounding volumes can all be obtained in a single memory transaction.

As shown in FIG. 13 , the data structure 1300 stores (in this example inthe first three lines although other arrangements would of course bepossible) the respective co-ordinates for the (e.g.) minimum(bottom-left) parent vertex (i.e. the parentMin_for_axis values) as p_x,p_y, p_z. In this example, as shown in FIG. 13 , the parent vertexco-ordinates p_x, p_y, p_z are each stored as 24-bit floating pointvalues. Thus, 9 bytes of data are used for storing the parentco-ordinates for the three axes. Other arrangements would of course bepossible.

As mentioned above, the child node co-ordinates are encoded relative tothe parent vertex co-ordinates, i.e. relative to p_x, p_y, p_z. However,rather than storing the differences in full, the differences are insteadencoded using the scaling factors as described above, such that for eachchild node bounding volume there is stored (for each axis) a baseco-ordinate value, to which the per axis modifiers (scaling factors) canbe applied to determine the actual co-ordinate value relative to theparent vertex co-ordinates.

The data structure 1300 accordingly also stores respective modifiervalues in the form of respective scaling factors for each of the x,y,zaxes: scale_x, scale_y, scale_z. In this example, each scaling factor isstored as a 32-bit floating point exponent for the axis (therefore as an8-bit unsigned value), such that 3 bytes are used to store the scalingfactors, but other arrangements would of course be possible. Thesescaling factors are applied to the base co-ordinate values for all ofthe child nodes.

The data structure 1300 also stores the base co-ordinate values for eachof the child nodes, where ‘lo_x0, hi_x0’ are 8-bit values for the lowerand upper co-ordinates of the bounding volume for child node ‘0’ alongthe x axis, and where ‘lo_y0, hi_y0’ and ‘lo_z0, hi_z0’ are thecorresponding co-ordinates for child node ‘0’ along the y and z axes,and so on for the other child nodes (which in this example there are sixchild nodes, numbered respectively ‘0’ through ‘5’). Thus, for eachchild node, there are stored six 8-bit base co-ordinate values,totalling 6 bytes per child node. In total therefore, 36 bytes are usedfor storing the base co-ordinate values for the six child nodes,although other arrangements would of course be possible.

The data structure 1300 may further store any other data that maydesirably be stored in respect of the nodes. For instance, as shown inFIG. 13 , the data structure 1300 further stores, in respect of eachnode, a respective node type value nt0, . . . , nt5. The node type valueindicates a type of child node, e.g. whether the child node is a leafnode containing primitives, whether the child node is another internal(non-leaf) node, or whether the child node is not used. Further, it willbe appreciated that other metadata may also be stored as part of thesame data structure 1300. For instance, in the data structure 1300 shownin FIG. 13 there are a number of spare bits that can be used for storingany suitable and desired data (or metadata) that may desirably be storedin respect of the node in question.

It will be appreciated that storing the child node volume data in thisway can therefore allow for a more efficient packing of the data inmemory, e.g. such that in the example shown in FIG. 13 child node volumedata for up to six child nodes can be stored within a single 64-bytedata structure that can be obtained in a single memory transaction.Storing the child node volume data in this encoded manner can thereforebe very efficient. The node data can then be obtained from memory andprocessed as part of the ray tracing traversal operation, e.g. asdescribed above.

FIG. 14 is a flow chart illustrating how the child node volume data maybe obtained according to an embodiment. In particular, at some pointduring a ray tracing operation, it may be determined that child nodevolume data is required for an internal (non-leaf) node, e.g. that is tobe tested as part of the traversal operation (step 1400). The graphicsprocessor may thus issue a request to the memory system for the requiredchild node volume data (step 1401). If the data is already presentlocally, e.g. within a cache, it can be fetched from that locationaccordingly. On the other hand, if the data is not present locally, itmust be obtained from memory.

In the present embodiment the data is stored in memory in an encodedform, as described above and as shown in FIG. 13 . Thus, the child nodevolume data is in the present embodiments obtained from memory in suchencoded form (step 1402), e.g. by reading in the data structure 1300shown in FIG. 13 . The child node volume data must therefore be decoded,e.g. by applying the scaling factors to the respective base co-ordinatevalues, as described above, in order to determine the associated childnode bounding volume (step 1403). The decoded bounding volume can thenbe used, e.g., for the ray-volume intersection testing described above,as part of the ray tracing operation.

In embodiments, the primitives represented by a respective leaf node ofthe BVH are also stored in memory an efficient manner to facilitateimproved memory access. For example, a primitive is often expressed as atriangle, and therefore has three vertices, (x0, y0, z0), (x1, y1, z1)and (x2, y2, z2). Therefore, where each axis is expressed as a 32-bitfloating point number, each vertex is specified using 12 bytes of dataand a primitive specified using 36 bytes of data. To facilitate memoryaccess, in embodiments, a plurality of primitives are thus storedtogether in a single data structure that fits within an integer numberof cache lines. Again, this has the benefit that by storing a pluralityof primitives in a BVH leaf node, that multiple rays in the group ofplural rays that are performing the traversal at the same time can betested against multiple ray-primitive intersects in one processinginstance, thus reducing the number of memory access operations.

For example, in embodiments, a leaf node may comprise three primitives(triangles). In that case, each triangle comprises three vertices, eachvertex comprising three 32-bit floating point numbers. Each trianglealso comprises validity and opaqueness fields. These fields indicatewhether the corresponding triangle is valid (is used), and if so,whether the triangle is opaque.

FIG. 15 shows an example of a data structure 1500 for storing such datain memory. In particular, FIG. 15 shows a 128 byte data structurecomprising 32 lines each capable of storing 32 bits. This data structurecan therefore fit within two 64 byte cache lines. As shown in FIG. 15 ,the primitive vertices are stored as 32-bit floating point co-ordinatevalues for each axis, where ‘tri_0_vertex_0_x’ represents the xco-ordinate of the first vertex (vertex 0) for the first primitive(triangle 0), ‘tri_0_vertex_0_y’ and ‘tri_0_vertex_0_z’ are thecorresponding y and z co-ordinates, and so on. Thus, as shown in FIG. 15, for each primitive there are stored three vertices, with threeco-ordinates (x,y,z) being stored for each vertex. Each vertex thereforecomprises:

3(number of triangles)*3(vertices per triangle)*3(axes pervertex)*4bytes(32-bit data type)=108bytes.

This can therefore fit within two 64 byte cache lines. For instance,FIG. 15 shows a data structure 1500 for storing . . .

The following have fields have been described in the document on page 80and 81.tri_*_vertex_*_x/y/z are the 32-bit floating co-ordinates of the threeprimtives/triangles.V0, V1, V2: Indicates if the primitive is validO0, O1, O2: Indicates if the primitive is opaque.

Other arrangements would of course be possible. The foregoing detaileddescription has been presented for the purposes of illustration anddescription. It is not intended to be exhaustive or to limit thetechnology to the precise form disclosed. Many modifications andvariations are possible in the light of the above teaching. Thedescribed embodiments were chosen in order to best explain theprinciples of the technology and its practical application, to therebyenable others skilled in the art to best utilise the technology invarious embodiments and with various modifications as are suited to theparticular use contemplated. It is intended that the scope be defined bythe claims appended hereto.

1. A method of operating a graphics processor when rendering a framethat represents a view of a scene comprising one or more objects using aray tracing process, wherein the ray tracing process uses a ray tracingacceleration data structure indicative of the distribution of geometryfor the scene to be rendered to determine geometry for the scene thatmay be intersected by a ray being used for a ray tracing operation, theray tracing acceleration data structure comprising a plurality of nodes,each node associated with a respective one or more volumes within thescene, the ray tracing process comprising performing for a plurality ofrays a traversal of the ray tracing acceleration data structure todetermine, by testing the rays for intersection with the volumesrepresented by the nodes of the acceleration data structure, geometryfor the scene to be rendered that may be intersected by the rays; thegraphics processor comprising a programmable execution unit operable toexecute programs to perform graphics processing operations, and in whicha program can be executed by groups of plural execution threadstogether; the method comprising: when a group of execution threads isexecuting a program to perform a ray tracing acceleration data structuretraversal, with individual execution threads in the group of executionthreads performing a traversal operation for a respective ray in acorresponding group of rays such that the group of rays performing thetraversal operation together, in response to the execution threadsexecuting a set of one or more ray-volume testing instructions that areincluded in the program in respect of a node of the ray tracingacceleration data structure: testing one or more rays from the group ofplural rays that are performing the traversal operation together forintersection with the one or more volumes associated with the node beingtested; and returning a result of the intersection testing for the nodefor the traversal operation.
 2. The method of claim 1, wherein thegraphics processor further comprises an intersection testing circuitoperable to test rays for intersection with the volumes associated withthe nodes of the ray tracing acceleration data structure, and whereinthe set of one or more ray-volume testing instructions, when executed byexecution threads of the group of plural execution threads, will causethe execution unit to message the intersection testing circuit toperform the testing of the one or more rays from the group of pluralrays that are performing the traversal of the ray tracing accelerationdata structure together for intersection with the one or more volumesassociated with the node to be tested and to return the result of theintersection testing for the node being tested to the execution unit. 3.The method of claim 1, wherein the ray tracing acceleration datastructure comprises a tree structure comprising a plurality of branchesassociated with a respective plurality of leaf nodes, wherein eachnon-leaf in the tree structure is a parent node for a respective set ofplural child nodes, each non-leaf node thereby being associated with acorresponding plurality of child volumes, and wherein testing rays forintersection with the volume associated with a node comprises testingthe rays for intersection with the volumes for each of the respectiveset of child nodes for the node being tested, the method comprisingreturning a result of the intersection testing for each of the childnodes of the node being tested.
 4. The method of claim 1, wherein atraversal record is maintained in order to manage the traversaloperation, wherein the program is operable to work through the entriesin the traversal record to determine which nodes should be tested, andwherein the result of the intersection testing comprises an indicationof which node or nodes are to be tested during the traversal operation,the indication being written to the traversal record.
 5. The method ofclaim 4, wherein the traversal record comprises a traversal stack. 6.The method of claim 4, wherein the result of the intersection testingalso comprises an indication of which of the one or more rays that weretested against the node were determined to intersect the one or morevolumes associated with the node.
 7. The method of claim 4, wherein whenwriting the result of the intersection testing would cause overflow ofthe traversal record, the whole traversal record is written out tomemory, and an indication of this is written into the traversal recordto allow the execution unit to subsequently load in the traversal recordthat was written out.
 8. The method of claim 4, wherein the traversalrecord is managed via a set of shared registers allocated for the groupof plural execution threads that are executing the program to performthe traversal operation for the group of plural rays.
 9. The method ofclaim 1, wherein the program also includes a set of one or moreinstructions that when executed cause the execution threads in the groupof execution threads to be in an active state at least until thetraversal operation to determine which, if any, geometry for the scenemay be intersected by the rays is finished for all of the rays in thegroup of rays being processed by the group of execution threads, suchthat the group of rays performs the traversal operation together.
 10. Amethod of compiling a shader program to be executed by a programmableexecution unit of a graphics processor that is operable to executegraphics processing programs to perform graphics processing operations;the method comprising: including in a shader program to be executed by aprogrammable execution unit of a graphics processor when rendering aframe that represents a view of a scene comprising one or more objectsusing a ray tracing process, wherein the ray tracing process uses a raytracing acceleration data structure indicative of the distribution ofgeometry for the scene to be rendered to determine geometry for thescene that may be intersected by a ray being used for a ray tracingoperation, the ray tracing acceleration data structure comprising aplurality of nodes, each node associated with a respective one or morevolumes within the scene, the ray tracing process comprising performingfor a plurality of rays a traversal of the ray tracing acceleration datastructure to determine, by testing the rays for intersection with thevolumes represented by the nodes of the acceleration data structure,geometry for the scene to be rendered that may be intersected by therays, and wherein the program is to be executed by a group of pluralexecution threads, with individual execution threads in the group ofexecution threads performing a traversal operation for a respective rayin a corresponding group of rays: a set of one or more ray-volumetesting instructions for testing rays for intersection with the one ormore volumes associated with a given node of the ray tracingacceleration data structure that is to be tested during the traversaloperation, which set of ray-volume testing instructions, when executedby execution threads of the group of plural execution threads, willcause: the graphics processor to test one or more rays from the group ofplural rays that are performing the traversal operation together forintersection with the one or more volumes associated with the node beingtested; and a result of the intersection testing to be returned for thenode for the traversal operation.
 11. A graphics processor that isoperable to render a frame that represents a view of a scene comprisingone or more objects using a ray tracing process, wherein the ray tracingprocess uses a ray tracing acceleration data structure indicative of thedistribution of geometry for the scene to be rendered to determinegeometry for the scene that may be intersected by a ray being used for aray tracing operation, the ray tracing acceleration data structurecomprising a plurality of nodes, each node associated with a respectiveone or more volumes within the scene, the ray tracing process comprisingperforming for a plurality of rays a traversal of the ray tracingacceleration data structure to determine, by testing the rays forintersection with the volumes represented by the nodes of theacceleration data structure, geometry for the scene to be rendered thatmay be intersected by the rays; the graphics processor comprising: aprogrammable execution unit operable to execute programs to performgraphics processing operations, and in which a program can be executedby groups of plural execution threads together; wherein the executionunit is configured such that, when a group of execution threads isexecuting a program to perform a ray tracing acceleration data structuretraversal, with individual execution threads in the group of executionthreads performing a traversal operation for a respective ray in acorresponding group of rays that are thereby performing the traversaloperation together, in response to the execution threads executing a setof one or more ray-volume testing instructions included in the programin respect of a node of the ray tracing acceleration data structure: theexecution unit triggers testing of one or more rays from the group ofplural rays that are performing the traversal of the ray tracingacceleration data structure together for intersection with the one ormore volumes associated with the node being tested, wherein a result ofthe intersection testing is then returned for the node for the traversaloperation.
 12. The graphics processor of claim 11, wherein the graphicsprocessor further comprises an intersection testing circuit operable totest rays for intersection with the volumes associated with the nodes ofthe ray tracing acceleration data structure, and wherein the set of oneor more ray-volume testing instructions, when executed by executionthreads of the group of plural execution threads, will cause theexecution unit to message the intersection testing circuit to performthe testing of the one or more rays from the group of plural rays thatare performing the traversal of the ray tracing acceleration datastructure together for intersection with the one or more volumesassociated with the node to be tested and to return the result of theintersection testing for the node being tested to the execution unit.13. The graphics processor of claim 11, wherein the ray tracingacceleration data structure comprises a tree structure comprising aplurality of branches associated with a respective plurality of leafnodes, wherein each non-leaf in the tree structure is a parent node fora respective set of plural child nodes, each non-leaf node thereby beingassociated with a corresponding plurality of child volumes, and whereintesting rays for intersection with the volume associated with a nodecomprises testing the rays for intersection with the volumes for each ofthe respective set of child nodes for the node being tested.
 14. Thegraphics processor of claim 11, wherein a traversal record is maintainedin order to manage the traversal operation, wherein the program isoperable to work through the entries in the traversal record todetermine which nodes should be tested, and wherein the result of theintersection testing comprises an indication of which node or nodes areto be tested during the traversal operation, the indication being addedto the traversal record.
 15. The graphics processor of claim 14, whereinthe traversal record comprises a traversal stack.
 16. The graphicsprocessor of claim 14, wherein the result of the intersection testingalso comprises an indication of which of the one or more rays that weretested against the node were determined to intersect the one or morevolumes associated with the node.
 17. The graphics processor of claim14, wherein when added the result of the intersection testing wouldcause overflow of the traversal record, the whole traversal record iswritten out to memory, and an indication of this is written into thetraversal record to allow the execution unit to subsequently load in thetraversal record that was written out.
 18. The graphics processor ofclaim 14, wherein the traversal record is managed via a set of sharedregisters allocated for the group of plural execution threads that areexecuting the program to perform the traversal operation for the groupof plural rays.